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The  purpose  of  the  present  paper  is  to  describe  developmental  changes  in 
paralinguistic  sensitivity  as  a function  of  a domain-general  constraint  on  mental 
representations  and  a domain-specific  constraint  on  the  processing  of  prosody 
and  paralanguage.  Listeners  were  28  4-year-old,  28  7-year-old,  and  28  10-year- 
old  native  speakers  of  American  English  with  normal  hearing.  Listeners  heard 
utterances  in  three  contexts;  low-pass  filtered,  reiterant,  and  full,  unaltered  speech. 
In  each  context,  half  of  the  utterances  contained  consistent  lexical  and 
paralinguistic  information  about  the  speaker’s  affective  state  and  half  contained 
discrepant  lexical  and  paralinguistic  information.  Children  rated  the  affective  state 
of  the  speaker  as  either  happy  or  angry  in  a single  alternative  forced-choice 
procedure. 
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Four-year-olds  were  sensitive  to  affective  variations  in  reiterant  speech  but 
insensitive  to  affective  variations  in  filtered  speech,  and  when  faced  with  affective 
discrepancy  between  lexical  and  paralinguistic  cues,  4-year-olds  were  unable  to 
selectively  attend  to  information  carried  paralinguistically.  Seven-year-olds’ 
sensitivity  to  paralinguistic  cues  in  discrepant  full  speech  was  significantly  greater 
than  that  of  4-year-olds.  About  one-half  of  the  listeners  in  this  age  range 
performed  similarly  to  4-year-olds  (i.e.,  basing  their  attributions  primarily  on  lexical 
content)  while  the  other  half  performed  more  similarly  to  1 0-year-olds  (basing  their 
attributions  primarily  on  paralinguistic  cues.)  Continued  increments  in  sensitivity 
to  paralinguistic  variations  in  filtered  speech  and  in  the  presence  of  competing 
lexical  information  were  observed  for  the  1 0-year-old  sample. 

The  results  provide  preliminary  support  for  the  proposed  cognitive  and 
linguistic  constraints.  The  proposed  nature  of  these  constraints  and  the  relation 
of  this  approach  to  models  of  developmental  speech  perception  and  speech 
prosody  are  discussed  with  an  emphasis  on  the  productiveness  of  an  integrative 
approach  to  developmental  issues. 


XII 


CHAPTER  1 
INTRODUCTION 

The  purpose  of  the  present  paper  is  to  examine  children’s  utilization  of 
acoustic  cues  to  speaker  affect  (vocal  paralanguage)  in  the  presence  of 
competing  lexical  information.  A consistent  finding  has  been  that  children  tend 
to  discount  acoustic  cues  to  affect  when  competing  lexical  information  is 
available.  (For  example,  the  sentence,  'You’re  my  favorite  person"  read  in  an 
angry  voice  would  be  interpreted  as  "very  happy"  by  many  young  children.) 

Two  general  explanations  have  been  offered  for  this  effect.  First,  it  has  been 
suggested  that  vocal  paralanguage  is  less  systematic  and  more  subtle  than  lexical 
information  and  that  the  ability  to  utilize  paralanguage  increases  gradually  with 
development.  A second  explanation  posited  for  lexical/paralinguistic  interference 
is  that  discrepancies  between  lexical  and  paralinguistic  cues  constitute  instances 
of  sarcasm  or  joking  and  that  young  children  do  not  understand  these  complex 
forms  of  communication.  However,  recent  research  in  developmental 
psychoacoustics  and  speech  perception  calls  into  question  the  adequacy  of  these 
accounts.  First,  there  is  evidence  that  sensitivity  to  vocal  paralanguage  is  present 
from  birth,  in  contrast  to  the  hypothesis  that  this  sensitivity  develops  gradually. 
Second,  the  interference  of  lexical  cues  with  the  interpretation  of  other  acoustic 
information  appears  to  be  a general  developmental  phenomenon  and  not  limited 
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to  the  detection  of  speaker  affect.  An  alternative  account,  consistent  with  current 
research  in  developmental  psychoacoustics,  speech  perception,  language 
acquisition,  and  cognitive  development  is  proposed  in  the  present  paper. 

Of  particular  interest  is  whether  developmental  changes  in  the  relative 
weighting  of  lexical  and  paralinguistic  cues  can  be  predicted  on  the  basis  of  two 
general  constraints,  one  linguistic  and  one  cognitive.  The  linguistic  constraint  is 
that  many  of  the  acoustic  parameters  associated  with  vocal  affect  (e.g., 
fundamental  frequency,  duration,  and  intensity)  may  play  a predominantly 
linguistic  role  early  in  development  by  highlighting  clause  boundaries, 
grammatical  modifications,  and  new  lexical  items.  The  cognitive  constraint  is  that 
young  children  (before  the  age  of  five)  demonstrate  considerable  difficulty  in 
simultaneously  representing  objects  or  events  in  two  conflicting  ways. 

The  combination  of  these  constraints  can  be  expected  to  influence  young 
children’s  judgments  of  discrepant  auditory  stimuli  in  the  following  ways:  First, 
the  cognitive  constraint  predicts  that  young  children  will  attend  to  only  a single 
stimulus  representation  (i.e.,  either  lexical  or  paralinguistic.)  The  linguistic 
constraint  predicts  that  young  children  will  be  biased  toward  linguistic,  as 
opposed  to  paralinguistic,  processing  of  acoustic  variables.  As  a result,  young, 
language-learning  children  would  be  expected  to  make  attributions  of  speaker 
affect  consistent  with  the  lexical  content  of  discrepant  utterances. 


3 


In  the  present  paper,  the  literature  on  developmental  changes  in  sensitivity  to 
vocal  paralanguage  will  be  presented  followed  by  a review  of  the  literature  on 
affect  detection  in  cases  of  discrepancy.  Next,  evidence  for  the  proposed 
linguistic  and  cognitive  constraints  will  be  presented  in  two  sections:  first,  the  role 
of  acoustic  parameters  in  speech  prosody  and  their  relevance  to  early  language 
acquisition  will  be  addressed  and,  second,  the  implications  of  processing 
limitations  such  as  the  dual  representation  problem  for  the  detection  of  affect  in 
discrepant  messages  will  be  discussed.  Finally,  the  adequacy  of  the  proposed 
constraints  will  be  assessed  in  a cross-sectional  study  of  children’s  attributions 
of  speaker  affect. 


CHAPTER  2 

REVIEW  OF  THE  LITERATURE 
Sensitivity  to  Vocal  Paralanauaae 

Following  Garnica  (1987),  a distinction  will  be  made  throughout  this  paper 
between  the  prosodic  and  paralinguistic  functions  of  acoustic  variables.  When 
acoustic  variables  mark  lexical  and  grammatical  variations  they  will  be  referred  to 
as  prosodic,  and  when  they  convey  affective  information,  they  will  be  referred  to 
as  paralinguistic.  Both  prosody  and  paralanguage  are  comprised  of  the  same 
group  of  primary  acoustic  correlates,  namely,  fundamental  frequency  (Fq, 
perceived  as  pitch),  duration,  and  intensity. 

A classic  illustration  of  the  role  of  acoustic  variables  in  conveying  affective 
information  was  provided  by  Williams  and  Stevens  (1972)  in  their  acoustic 
analyses  of  the  voice  of  a New  Jersey  radio  announcer  during  the  crash  of  the 
Hindenburg.  Spectrographs  of  the  announcer’s  speech  before  and  after  the  crash 
reveal  changes  in  several  acoustic  variables.  In  particular,  there  is  an  upward 
shift  in  Fq,  increased  duration  of  specific  lexical  units,  and  variation  in  formant 
structure  (the  concentrations  of  energy  at  specific  frequencies  that  are  critical  for 
speech  comprehension).  Presumably  these  changes  reflect  the  affective  arousal 
associated  with  observing  such  a tragic  event. 
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There  is  considerable  evidence  that  humans  are  sensitive  to  changes  in 
affective  state  conveyed  by  the  voice.  And  of  course,  from  an  ethological 
perspective,  it  is  not  surprising  that  vocal  affect  should  have  an  important 
signalling  function  for  humans  as  it  does  for  other  primates.  In  this  section,  the 
literature  on  the  encoding  and  decoding  of  vocal  paralanguage  will  be  reviewed 
with  particular  attention  to  developmental  changes  in  sensitivity. 

Adults 

Adult  listeners  are  quite  good  at  detecting  the  affect  expressed  by  vocal 
paralanguage.  However,  it  has  been  difficult  to  specify  the  way  in  which  acoustic 
parameters  covary  with  changes  in  affect.  Even  so,  several  cues  (or  patterns  of 
cues)  have  been  isolated  which  appear  to  be  involved  in  the  production  and 
interpretation  of  affective  signals. 

In  one  of  the  earliest  experimental  manipulations  of  paralinguistic  cues, 
Ueberman  and  Michaels  (1962)  synthesized  several  utterances  read  by  male 
speakers  in  different  "emotional  modes."  These  "modes"  corresponded  to  states 
of  boredom,  confidentiality,  disbelief,  fear,  happiness,  and  pomposity.  By  using 
synthetic  speech,  the  experimenters  were  able  to  evaluate  the  independent 
contributions  of  and  intensity  to  judgments  of  speaker  state.  Adult  judgment 
accuracies  of  approximately  85%  were  obtained  when  the  original  speech  signal 
was  presented  but  were  reduced  to  47%  when  and  intensity  were  presented 
without  lexical  content.  Providing  only  Fq  information  resulted  in  accuracies  of 
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approximately  44%.  Further  reductions  in  accuracy  were  obtained  when 
perturbations  in  the  contour  were  smoothed.  The  researchers  concluded  that 
fine  and  gross  Fq  structure,  intensity,  and  phonetic  structure  all  contributed  to  the 
encoding  and  decoding  of  affective  information  in  speech. 

However,  there  are  two  important  limitations  to  this  research.  First,  it  was 
extremely  difficult  to  generate  natural  sounding  synthetic  speech.  The  substantial 
decrement  in  performance  from  the  original  to  the  synthetic  stimuli  (in  which  F„ 
and  intensity  parameters  were  manipulated)  may  be  due,  in  part,  to  the 
unnaturalness  of  the  synthetic  stimuli  and  not  to  the  lack  of  phonetic  information. 
Second,  the  “emotional  modes"  identified  do  not  correspond  well  to  current 
emotion  theory.  For  example,  confidentiality,  boredom,  and  disbelief  may  be 
better  examples  of  cognitive,  than  of  affective,  states.  The  most  likely  result  of 
these  limitations  would  be  to  attenuate  the  magnitude  of  effects  obtained  from  the 
manipulation  of  Fq  and  intensity  cues. 

In  a comprehensive  study  of  the  effect  of  acoustic  variables  on  listeners’ 
affective  judgments,  Scherer  and  Oshinsky  (1977)  manipulated  intensity,  Fq 
(variation,  level,  and  contour),  tempo,  envelope,  and  number  of  harmonics  in  a 
series  of  synthetic  tone  sequences.  Regression  analyses  revealed  that  linear 
combinations  of  these  cues  account  for  two-thirds  to  three-fourths  of  the  variance 
in  listeners’  affective  judgments.  For  example,  the  acoustic  parameters  associated 
with  a judgment  of  fear,  in  order  of  predictive  strength,  are  wide  Fq  contour,  fast 
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tempo,  many  harmonics,  high  Fg  level,  round  amplitude  envelope,  and  small  Fg 
variation. 

Ladd,  Silverman,  Tolkmitt,  Bergmann,  and  Scherer  (1985)  digitally 
resynthesized  speech  to  determine  the  extent  to  which  the  influence  of  voice 
quality  (the  distribution  of  energy  across  the  frequency  spectrum),  Fg  range,  and 
Fg  contour  provide  independent  cues  to  speaker  affect.  Through  the  resynthesis 
technique,  both  Fg  contour  and  range  could  be  systematically  manipulated  within 
the  context  of  naturally  spoken  utterances.  Voice  quality  was  manipulated  by 
instructing  the  speaker  to  produce  different  vocal  expressions.  Subjects’ 
attributions  of  speaker  arousal  were  measured  on  five  bipolar  scales 
(relaxed/aroused,  open/deceitful,  annoyed/content,  insecure/arrogant,  and 
indifferent/involved).  Voice  quality  and  Fg  range  accounted  for  93%  of  the 
variance  in  these  judgments.  In  particular,  voice  quality  appears  related  to  the 
positive/negative  valence  of  subjects’  attributions  while  Fg  range  is  related  to  level 
of  arousal. 

The  use  of  naturally  spoken  speech  to  obtain  affective  judgments  is 
problematic,  however,  because  listeners  may  use  the  semantic  valence  of  lexical 
content  as  a cue.  Two  methods  for  isolating  acoustic  information  and  reducing 
or  eliminating  lexical  content  are  low-pass-filtering  (Rogers,  Scherer,  & Rosenthal, 
1971)  and  random-splicing  (Scherer,  1971).  Each  procedure  preserves  some 
acoustic  information  while  disrupting  or  degrading  others.  For  example,  low-pass 
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filtering  preserves  the  Fq  contour  but  reduces  spectral  content  and  attenuates 
perceived  loudness.  Similarly,  while  random-splicing  preserves  spectral  content, 
it  disrupts  temporal  organization  and  the  contour.  In  each  case,  the  resulting 
signal  is  an  incomplete  representation  of  the  paralinguistic  features  of  the  original 
utterance. 

Friend  (Friend,  1991;  Friend  & Farrar,  in  press)  developed  a procedure, 
reiterant  speech,  for  use  in  establishing  a baseline  of  listeners’  sensitivity  to  vocal 
paralanguage  in  the  absence  of  lexical  information.  The  procedure  was  based 
on  previous  work  in  speech  prosody  (Liberman  & Streeter,  1978;  Nakatani  & 
Schaffer,  1977).  Reiterant  stimuli  were  generated  by  systematically  replacing  the 
syllables  of  the  original  utterance  with  nonsense  syllables  which  produce  a similar 
Fq  contour.  After  they  were  generated,  the  reiterant  stimuli  were  compared  with 
the  original  stimuli  on  the  acoustic  parameters  F^  (mean,  contour,  range)  and 
intensity  (level  and  variance). 

Listeners  were  significantly  more  accurate  in  their  affective  judgments  of 
reiterant  stimuli  than  in  their  judgments  of  either  low-pass  filtered  or  random- 
spliced  stimuli.  In  fact,  listeners  evidenced  a bias  toward  ratings  of  anger  in  the 
random-spliced  condition.  Since  the  primary  difference  between  this  condition 
and  the  reiterant  condition  was  the  disruption  of  temporal  information,  this  finding 
provides  further  support  for  the  role  of  the  Fj,  contour  and  temporal  factors  (e.g., 
duration)  in  decoding  speaker  affect.  In  addition,  although  listeners  reported  that 
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it  was  difficult  to  judge  speaker  affect  in  the  absence  of  lexical  information, 
judgments  of  reiterant  speech  were  still  quite  good  (79%  to  95%  correct) 
suggesting  that  specific  phonetic  information  is  not  critical  to  the  detection  of 
vocal  affect. 

Two  general  conclusions  follow  from  this  research.  First,  adult  listeners  are 
really  quite  accurate  in  judging  the  affect  of  a speaker  based  on  vocal 
paralanguage.  Second,  there  are  numerous  acoustic  cues  (including  Fq,  duration, 
and  intensity)  which,  while  they  do  not  appear  invariant,  are  associated  in  a 
systematic  way  with  the  encoding  and  decoding  of  vocal  affect.  In  the  remainder 
of  this  section,  developmental  trends  in  the  ability  to  make  discriminations  based 
upon  paralinguistic  cues  will  be  reviewed. 

Infants 

Work  in  developmental  psychoacoustics  has  indicated  that  infants  readily 
discriminate  some  of  the  cues  which  have  been  shown  in  the  adult  literature  to 
be  relevant  to  the  encoding  and  decoding  of  vocal  paralanguage.  Clarkson  and 
Clifton  (1985)  used  a visually  reinforced  operant  head-turn  procedure  to  test  the 
ability  of  7-  and  8-month-olds  to  discriminate  tonal  complexes  on  the  basis  of  their 
fundamental  frequencies.  Infants  evidenced  greater  differential  responding  to 
complexes  which  had  different  harmonic  components  and  a different  Fq  but  not 
to  complexes  which  had  different  harmonic  components  but  the  same  F^.  Infants’ 
pitch  perception  at  7 months  was  similar  to  that  demonstrated  by  adults. 
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Using  a similar  paradigm,  Clarkson,  Clifton,  and  Perris  (1988),  showed  that  7- 
and  8-month-olds  were  also  able  to  categorize  tonal  complexes  on  the  basis  of 
their  spectral  content.  Recall  that,  in  the  first  study  (Clarkson  & Clifton,  1985), 
infants  were  trained  to  categorize  on  the  basis  of  and  to  ignore  variations  in 
spectral  content.  In  this  study,  infants  were  trained  to  do  just  the  opposite,  to 
categorize  tonal  complexes  on  the  basis  of  their  spectral  components  and  to 
ignore  F^.  Together,  these  experiments  suggest  that  infants  are  indeed  sensitive 
to  variations  in  F„  and  spectral  content  (or  timbre,  related  to  voice  quality).  Is  it 
possible  that  the  ability  to  make  such  discriminations  might  have  implications  for 
infant  processing  of  natural  speech? 

Several  studies  have  demonstrated  that  young  infants  prefer  a particular 
speech  register  (known  as  infant-directed  speech)  to  the  typical  speech  directed 
to  adults.  This  specialized  speech  register  is  characterized  by  an  expanded  Fq 
contour,  higher  peak  F„,  longer  pauses,  and  shorter  utterances.  In  short,  the 
durational  and  pitch  cues  present  in  adult-directed  (AD)  speech  appear  to  be 
exaggerated  in  infant-directed  (ID)  speech.  This  exaggerated  speech  style  may 
serve  both  a prosodic  and  a paralinguistic  function.  The  focus  in  this  section  is 
on  infants’  abilities  to  discriminate  infant-directed  utterances  from  other  utterances 
and  on  the  affective  function  of  this  speech  register. 

Using  an  infant  auditory  preference  procedure  (operant  head-turn),  Fernald 
(1985)  demonstrated  that  four-month-old  infants  prefer  to  listen  to  ID  speech. 
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Infants  turned  their  heads  significantly  more  often  in  the  direction  required  to 
produce  ID  speech  than  in  the  direction  which  produced  AD  speech.  Cooper  and 
Aslin  (1990)  extended  evidence  for  preferential  responding  to  ID  speech  into  the 
first  month  after  birth.  The  experiment  utilized  a visual-fixation-based  auditory- 
preference  procedure.  One-month-old  infants  looked  longer  at  a checkerboard 
when  looking  produced  ID  speech  as  opposed  to  AD  speech.  Similar  effects 
were  obtained  for  newborn  infants. 

Werker  and  MacLeod  (1989)  measured  both  the  attentional  and  affective 
responsiveness  of  4-  to  9-month-olds  to  ID  and  AD  speech.  Infants  received 
videotape  presentations  of  adults  reciting  scripts  in  either  ID  or  AD  speech.  Infant 
attentiveness  was  measured  by  the  amount  of  time  spent  watching  the  videotape 
presentations  and  affective  responsiveness  was  assessed  by  two  raters.  Both 
younger  and  older  infants  looked  longer  at  ID  presentations  than  at  AD 
presentations.  In  addition,  raters  blind  to  the  type  of  presentation  rated  infants 
viewing  ID  speech  as  significantly  more  affectively  responsive  than  the  same 
infants  viewing  AD  speech. 

In  a recent  study,  Fernald  (1989)  provided  evidence  that  the  instrumental 
communicative  intent  of  utterances  is  more  salient  in  ID  than  in  AD  speech. 
Samples  of  each  speech  type  were  provided  by  mothers  of  1 2-month-old  infants 
in  five  standard  contexts:  attention-bid,  approval,  prohibition,  comfort,  and  game 
initiation/telephone.  The  utterances  were  low-pass  filtered  at  400Hz  and 
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presented  to  adult  raters.  The  communicative  intent  of  ID  utterances  was 
recognized  with  significantly  greater  accuracy  than  the  intent  of  AD  utterances. 
In  a later  study,  Fernald  (1989,  as  cited  in  Fernald,  1992)  found  that  5-month-olds 
showed  differential  affective  responsiveness  to  approval  and  prohibition 
vocalizations  spoken  in  ID  speech  across  three  languages. 

Other  work  utilizing  discrimination  and  social-referencing  paradigms  (Campos 
& Stenberg,  1981)  has  demonstrated  that  infants  discriminate  discrete  affective 
vocal  expressions  and  that  their  behavior  is  guided  by  this  paralinguistic 
information.  Walker-Andrews  and  Grolnick  (1983)  habituated  5-month-olds  to  a 
happy  or  sad  visual  display  and  a matching  vocal  expression.  After  habituation, 
the  vocal  expression  was  changed  (to  happy  if  the  habituation  stimulus  was  sad 
or  to  sad  if  the  habituation  stimulus  was  happy)  and  the  visual  display  remained 
the  same.  It  was  expected  that,  if  infants  detected  a change  in  the  auditory 
stimulus,  they  would  increase  their  looking  time  to  the  habituated  visual  display. 
As  predicted,  dishabituation  occurred  when  the  vocal  expression  was  changed. 

Svejda  (1981)  utilized  a social-referencing  approach  in  which  ambiguous 
remote  controlled  toys  (toys  whose  valence  were  situationally  dependent)  were 
used  to  elicit  approach  or  avoidance  behavior  in  8 1/2-  and  1 1 -month-old  infants. 
Mothers  of  the  infants  produced  joy,  fear,  and  anger  vocal  expressions  of 
controlled  lexical  content  while  sitting  just  out  of  the  infants’  visual  field.  Infants’ 
speed  of  approach  to  the  stimulus  toys  at  both  ages  was  directed  by  maternal 
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paralanguage.  That  is,  infants  approached  the  toys  more  rapidly  when  mothers 
produced  joy  vocalizations  and  with  greater  latency  when  mothers  produced 
either  fear  or  anger  vocalizations.  This  literature  suggests  that  infants  are  capable 
of  making  distinctions  between  utterances  that  differ  paralinguistically  and  that 
these  affective  variations  are  sufficiently  meaningful  to  direct  infant  behavior. 
While  only  a few  studies  have  directly  addressed  the  issue  of  vocal  affect 
discrimination  in  infancy,  even  fewer  studies  have  been  conducted  on  this  topic 
in  early  childhood. 

Children 

Two  studies  indicate  that  children  are  able  to  detect  vocal  affect  when  lexical 
cues  do  not  provide  competing  information  (Dimitrovsky,  1964;  Matsumoto  & 
Kishimoto,  1983).  In  the  Dimitrovsky  (1964)  study,  5-  to  12-year-olds  heard  a 
standard  paragraph  read  in  four  affects  (happiness,  anger,  love,  and  sadness). 
Even  though  their  performance  was  above  chance,  5-year-olds  did  relatively 
poorly  (M  = 8.07  out  of  a possible  24)  in  detecting  the  intended  affective  states. 
Matsumoto  and  Kishimoto  (1983)  tested  4-  to  9-year-olds  on  the  Japanese 
syllabary  read  in  four  affects  (happiness,  surprise,  sadness,  and  anger). 
American  children  did  not  perform  above  chance  across  the  different  emotions 
until  age  6.  Taken  together,  these  results  suggest  that,  in  early  childhood,  the 
ability  to  discriminate  and  name  different  vocal  affects  is  fairly  fragile.  There  are 
several  possible  scenarios  to  describe  paralinguistic  sensitivity  across  infancy  and 
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early  childhood.  The  ability  to  detect  differences  in  vocal  affect  may  develop 
gradually  beginning  in  infancy  or  perhaps  young  children  are  actually  slightly  less 
sensitive  to  these  variations  than  are  infants.  Given  the  differences  in  procedure 
from  infant  studies  to  studies  of  early  childhood,  it  is  difficult  to  determine  which 
of  these  scenarios  is  most  accurate.  A third  possibility  is  that  young  children  are, 
in  fact,  sensitive  to  vocal  paralanguage  in  all  but  a few  special  circumstances. 
Such  circumstances  may  involve  the  need  to  use  verbal  labels  (e.g.,  surprise)  that 
are  not  well  practiced  or  the  need  to  make  affective  attributions  when  auditory 
sources  provide  discrepant  information. 

The  Case  of  Discrepancy 

Developmental  differences  in  the  decoding  of  vocal  paralanguage  become 
even  more  interesting  when  one  considers  the  effect  of  discrepancy  between 
lexical  and  paralinguistic  cues  on  listeners’  judgments.  Message  discrepancy  can 
take  many  forms  depending  on  the  sources  of  information  available  to  the  listener. 
In  this  section,  experiments  in  which  discrepancy  has  been  limited  to  the  auditory 
modality  (i.e.,  discrepancy  between  lexical  and  paralinguistic  cues)  will  be 
reviewed. 

Adults 

Because  both  lexical  cues  and  paralanguage  are  sources  of  information  about 
the  speaker,  it  is  interesting  to  contrast  these  channels  to  determine  which  source 
of  information  is  considered  most  valid  by  listeners.  In  a study  by  Mehrabian  and 
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Weiner  (1967),  single  words  which  represented  positive,  neutral,  and  negative 
attitudes  and  paralinguistic  cues  which  represented  these  same  dimensions  were 
combined.  The  stimulus  words  were  chosen  by  means  of  objective  ratings  while 
paralinguistic  cues  were  obtained  by  instructing  two  speakers  to  convey  either  an 
attitude  of  liking,  high  evaluation,  or  preference;  a neutral  attitude;  or  an  attitude 
of  disliking,  low  evaluation,  or  lack  of  preference.  Listeners  were  instructed  to 
attend  selectively  to  one  or  both  channels.  For  both  speakers,  the  effect  of 
paralanguage  was  significant  for  both  the  "use  tone  only"  and  the  "use  all 
information"  conditions  and  the  effect  of  lexical  cues  was  significant  only  for  the 
"use  content  only"  condition. 

Solomon  and  Yaeger  (1969)  also  studied  the  relative  importance  of  lexical 
cues  and  paralanguage  on  adults’  interpretations  of  discrepant  messages. 
Listeners  heard  recordings  of  short  sentences  in  which  lexical  content  was  varied 
along  positive,  neutral,  and  negative  dimensions.  Paralinguistic  information  was 
varied  to  convey  pleasure,  indifference,  and  displeasure.  Listeners  were  told  that 
they  were  listening  to  an  art  teacher  giving  feedback  to  a student.  After  each 
stimulus  sentence  was  played,  the  subjects  were  asked  three  questions:  1 ) What 
did  the  teacher  mean?  2)  How  does  the  child  feel?  3)  Does  the  teacher  like  or 
dislike  the  child?  Perhaps  the  most  interesting  result  was  that  the  effect  of  lexical 
content  was  greatest  for  question  1 and  least  for  question  3.  The  effect  of 
paralinguistic  cues  showed  the  opposite  pattern.  This  suggests  that  lexical 
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information  may  be  a preferred  source  of  information  about  a speaker’s  cognitive 
state  (What  did  the  teacher  mean?)  while  paralanguage  may  be  a preferred 
source  of  information  about  affect  (Does  the  teacher  like  or  dislike  the  child?). 
The  preference  for  the  use  of  paralanguage  to  make  attributions  of  speaker  affect 
is  apparent  across  both  of  these  studies.  An  obvious  question,  then,  is  whether 
this  effect  is  obtained  developmentally. 

Infants 

Very  little  attention  has  been  given  to  the  ability  of  infants  to  resolve 
discrepancy.  Two  studies  have  investigated  visual/auditory  discrepancy  (Barrett, 
1984;  Volkmar,  Hoder,  & Siegel,  1980)  and  have  produced  suggestive,  but 
inconclusive,  results  regarding  the  primacy  of  vocal  cues  to  visual  cues.  To  date, 
however,  only  a single  study  of  infants’  resolution  of  lexical  and  paralinguistic 
discrepancy  has  been  conducted.  Lawrence  and  Fernald  (1993)  used  a social- 
referencing  approach  to  investigate  the  resolution  of  lexical/paralinguistic 
discrepancy  by  9-  and  1 8-month-olds.  Infants  were  presented  with  an  ambiguous 
stimulus  toy  and  mothers  produced  utterances  in  which  either  the  lexical  content 
encouraged  the  infant  to  approach  the  toy  while  paralinguistic  cues  suggested 
prohibition  or  the  lexical  content  was  prohibitive  while  the  paralinguistic  content 
encouraged  approach  (e.g.,  "No  don’t,  don’t  touch"  said  in  a positive,  or 
encouraging,  voice). 
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The  results  were  striking,  with  9-month-olds  approaching  the  toy  when 
maternal  paraianguage  encouraged  approach  and  avoiding  the  toy  when 
maternal  paraianguage  was  prohibitive.  In  contrast,  18-month-olds  showed  both 
distress  at  the  discrepancy  and  approach  or  avoidance  of  the  toy  consistent  with 
the  lexical  content  of  the  utterance.  Because  there  is  only  a single  investigation 
of  this  effect  and  it  is  unreplicated,  the  interpretation  must  necessarily  be  cautious. 
However,  the  results  suggest  that  there  is  a reduction  in  behavior  regulation  by 
paralinguistic  information  in  cases  of  discrepancy  corresponding  to  the  transition 
from  infancy  to  early  language-learning. 

Children 

Solomon  and  All  (1972)  conducted  a cross-sectional  study  to  investigate 
developmental  differences  in  the  interpretation  of  discrepant  messages.  Listeners 
included  children  in  kindergarten  and  grades  2,  4,  6,  8,  10,  and  12,  and  college 
sophomores.  As  in  the  Solomon  and  Yaeger  (1969)  study,  sentences  representing 
three  levels  of  lexical  content  (positive,  neutral,  and  negative)  were  systematically 
combined  with  pleased,  indifferent,  and  displeased  paralinguistic  information. 

Again,  listeners  were  told  that  they  were  about  to  hear  an  art  teacher  speaking 
to  a student.  After  each  message  they  were  asked  three  questions:  1 ) What  did 
the  teacher  mean?  2)  How  does  the  child  feel?  3)  Does  the  teacher  like  or 
dislike  the  child?  The  relative  importance  of  paraianguage  appeared  to  be  a 
function  both  of  the  question  and  of  the  child’s  age.  Younger  children  relied 
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almost  entirely  on  lexical  content  in  their  responses  to  all  of  the  questions.  For 
question  #3,  however,  the  effect  of  lexical  content  peaked  at  grades  4 and  6 and 
showed  a slow  decline  thereafter.  At  the  college  level,  the  effect  of  paralanguage 
was  greater  than  that  of  lexical  content  for  question  #3. 

This  developmental  effect,  the  greater  weighting  of  the  lexicon  than  of 
paralanguage  by  children  relative  to  adults,  is  not  limited  to  the  domain  of 
emotions.  Similar  effects  have  been  obtained  in  recent  research  on  the  auditory 
Stroop  effect  (Jerger,  Martin,  & Pirozzolo,  1988)  and  melody  recognition 
(Morrongiello  & Roes,  1990).  Jerger  et  al.  (1988)  tested  children  3 to  6 years  of 
age  in  a reaction-time  experiment  in  which  they  listened  to  words  spoken  by  a 
male  and  a female  voice.  Children  were  instructed  to  ignore  lexical  information 
(e.g..  Mommy,  ice-cream.  Daddy)  and  to  press  a key  labeled  "Mommy"  when  they 
heard  the  female  voice  or  a key  labeled  "Daddy"  when  they  heard  the  male  voice. 
Children  were  unable  to  selectively  attend  to  nonlinguistic,  acoustic  information. 
Reaction  times  were  slower  for  conflicting  stimuli  (female  voice  saying  "Daddy"  or 
male  voice  saying  "Mommy")  than  for  consistent  stimuli  (female  voice  saying 
"Mommy"  or  male  voice  saying  "Daddy").  This  effect  was  greater  for  younger  than 
for  older  children. 

Morrongiello  and  her  colleagues  approached  this  problem  from  the 
perspective  of  recognition  memory  for  melodies  (nonlinguistic,  acoustic 
information)  and  lyrics  (lexical  information).  Children  (5-  to  6-year-olds)  and 
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adults  were  presented  with  three  novel  songs  (melody  and  lyrics).  During  the 
recognition  phase  of  the  experiment,  subjects  heard  one  of  five  types  of  songs: 
same  melody,  same  lyrics;  new  melody,  new  lyrics;  new  melody,  same  lyrics, 
same  melody,  new  lyrics;  or  a mismatch  of  the  melody  of  one  of  the  original 
songs  with  the  lyrics  of  another.  The  task  was  to  identify  how  similar  the  songs 
were  to  those  heard  previously.  When  the  same  lyrics  were  combined  with  a new 
melody,  children  were  much  more  likely  than  adults  to  respond  "same.”  Similarly, 
when  new  lyrics  were  combined  with  the  same  melody,  children  were  more  likely 
than  adults  to  respond  that  the  pairing  was  "not  at  all  the  same." 

Across  judgments  of  affect,  gender,  and  melody  children  seem  to  weight 
lexical  cues  more  heavily  than  competing  nonlinguistic,  acoustic  cues.  Because 
this  appears  to  be  a general  developmental  phenomenon,  explanations  which 
focus  solely  on  the  affective  domain  are  not  sufficient  to  account  for  the  effects. 
Instead,  it  is  necessary  to  consider  explanatory  phenomena  (e.g.,  the  acquisition 
of  receptive  and  productive  language)  which  are  likely  to  influence  development 
across  domains.  A recent  focus  in  the  language  literature  is  on  the  role  of  speech 
prosody  in  both  directing  attention  and  facilitating  comprehension.  The  presence 
of  exaggerated  prosodic  contrasts  in  language  input,  and  their  relevance  for 
young  language-learners,  may  have  consequences  for  the  paralinguistic  utilization 
of  acoustic  variables. 
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The  Role  of  Speech  Prosody 

Acoustic  variables  produced  by  the  vocal  tract  play  a dual  role,  informing 
listeners  on  the  one  hand  about  important  linguistic  information  (prosodic 
function)  and  about  speaker  characteristics  such  as  gender  and  affect 
(paralinguistic  function)  on  the  other.  As  was  demonstrated  in  the  section  on 
sensitivity  to  vocal  paralanguage,  there  is  compelling  evidence  that  young  infants 
discriminate  these  acoustic  variables  and  that  they  are  affectively  responsive  to 
the  particular  configuration  of  acoustic  features  known  as  infant-directed  speech. 
Importantly,  this  characteristic  speech  style  is  maintained  into  the  child’s  fourth 
year  (Fernald,  1994)  and  some  aspects  of  infant-  or  child-directed  speech  are 
maintained  into  the  child’s  fifth  year  (Garnica,  1987).  The  question  arises  then  of 
how  these  acoustic  features  are  processed  by  young  infants  (i.e.,  as  prosodic  or 
as  paralinguistic  cues,  or  both)  and  whether  the  processing  priority  of  prosody 
versus  vocal  paralanguage  changes  with  development. 

There  has  been  a rising  interest  in  the  role  of  prosodic  cues  in  early  language 
acquisition.  This  is  an  important  development  in  the  language  literature  because 
it  suggests  a mechanism  whereby  young  language-learners  may  begin  to  parse 
the  speech  stream,  identify  individual  lexical  units,  and  discern  syntactic  structure 
(Echols,  1993;  Kelly,  1992;  Morgan,  Meier,  & Newport,  1987).  Previously,  it  had 
been  asserted  that  linguistic  input  was  so  error  prone  as  to  render  environmental 
accounts  of  language-learning  implausible  (Chomsky,  1959). 
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At  least  two  types  of  evidence  are  necessary  to  demonstrate  that  prosodic 
cues  facilitate  language  acquisition.  First,  it  must  be  shown  that  specific  linguistic 
information  is  marked  by  prosodic  cues.  Second,  listeners  must  be  able  to 
perceive  prosodic  contrasts,  thereby  enhancing  linguistic  processing  and 
comprehension.  Recent  research  has  indicated  that  prosody  in  speech  to  infants 
and  young  children  marks  specific  linguistic  content  and  that  this  marking 
facilitates  linguistic  processing  and  comprehension. 

Prosodic  Markina 

Garnica  (1987)  studied  adult  encoding  of  linguistic  features  when  speaking  to 
another  adult  and  to  the  speaker’s  own  child  (either  a 2-year-old  or  a 5-year-old). 
Subjects  completed  three  verbal  tasks:  a picture  task  in  which  they  were  asked 
to  make  up  a story  about  several  pictures  taken  from  magazines,  a puzzle  task 
in  which  the  subject  gave  instructions  on  solving  the  puzzle,  and  a story  reading 
task  in  which  the  subject  read  a short  passage  aloud.  Speech  samples  were 
recorded  for  later  acoustic  and  perceptual  analyses. 

Several  acoustic  and  perceptual  features  distinguished  ID  from  AD  speech. 
Speech  directed  to  children  consisted  of  more  instances  of  rising  sentence-final 
pitch  terminals  (even  for  imperatives),  longer  durations  of  content  words  (these 
words  consisted  of  adjectives  for  the  5-year-olds  and  both  verbs  and  adjectives 
for  the  2-year-olds),  and  many  cases  of  double  primary  stress  (for  the  2-year-olds 
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only.)  Of  particular  interest  from  a linguistic  encoding  perspective  are  the  cases 

of  extended  durations  and  double  primary  stress  for  content  words. 

Duration  is  an  important  correlate,  although  not  the  only  correlate,  of  the 

percept  of  stress.  The  presence  of  increased  duration  and  double  primary  stress 

on  content  words  suggests  that  mothers  were  acoustically  highlighting  the 

important  words  in  the  sentence.  As  Garnica  (1987)  suggests. 

By  prolonging  the  duration  of  the  syllable  nucleus  of  red  in  Push  in  the  red 
piece  the  speaker  implies  with  greater  force  the  propositions  "not  the  yellow 
piece,  not  the  blue  piece."  (p.84) 

Fernald  and  Mazzie  (1991)  specifically  considered  the  role  of  prosodic 
emphasis  in  highlighting  new  lexical  information.  In  one  experiment,  mothers  told 
a story  based  on  a picture  book  to  their  14-month-old  infant  and  to  an  adult.  Six 
target  words  were  identified  by  visually  highlighting  specific  articles  of  clothing  the 
first  time  that  they  occurred  in  the  picture  book.  Mothers  used  characteristic  ID 
speech  when  telling  the  story  to  infants  but  not  to  adults.  More  important, 
however,  mothers  placed  the  target  words  on  the  Fq  peaks  of  their  utterances 
when  speaking  to  infants  but  not  when  speaking  to  an  adult.  Perceptual  analyses 
corroborated  this  effect.  A much  higher  proportion  of  stressed  target  words 
occurred  in  speech  to  infants  than  in  speech  to  adults. 

A second  experiment  examined  adults’  use  of  prosodic  marking  when 
conveying  new  information  to  another  adult.  The  experiment  involved  teaching 
an  adult  to  assemble  a kitchen  appliance  using  both  familiar  and  unfamiliar  labels. 
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Perceptual  judgments  revealed  that  unfamiliar  terms  received  primary  stress  when 
they  were  introduced  as  new  information,  but  not  once  their  use  in  the  procedure 
had  been  established.  However,  acoustic  measures  indicated  a difference  in  the 
marking  of  new  information  to  infants  and  adults.  In  the  first  experiment,  75%  of 
target  words  to  infants  occurred  on  the  Fq  peak  of  the  utterance,  in  contrast  to 
25%  of  the  target  words  spoken  to  adults  in  the  second  experiment.  These 
experiments  illustrate  the  role  of  prosodic  variables  in  marking  important 
information  in  speech  to  both  infants  and  adults.  However,  speech  addressed  to 
infants  appears  to  be  much  richer  in  terms  of  the  extent  to  which  sentential 
prosody  is  used  to  mark  new  information. 

Maternal  prosody  in  speech  to  infants  and  young  children  appears  to  highlight 
important  lexical  units  such  as  nouns,  verbs,  and  adjectives.  A case  for  the  role 
of  speech  prosody  in  facilitating  language  acquisition  would  be  stronger  if  it  could 
be  shown  that  maternal  feedback  to  children’s  language  errors  is  similarly 
acoustically  marked.  There  is  evidence  that  children  receive  corrective  feedback 
for  certain  grammatical  errors  in  the  form  of  recasts  (Bohannon  & Stanowicz, 
1988;  Farrar,  1992).  Recasts  are  utterances  in  which  an  adult  speaker  repairs  a 
child’s  grammatical  error  or  models  a new  grammatical  construction. 

Farrar  and  Friend  (1993)  tested  the  hypothesis  that  maternal  recasts  are 
acoustically  marked  relative  to  utterances  which  continue  a topic  of  conversation 
but  provide  no  negative  evidence  with  regard  to  the  child’s  grammaticality. 
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Maternal  recasts  and  topic  continuations  were  identified  in  a corpus  of  utterances 
from  conversations  between  mothers  and  their  22-month-old  children.  While  there 
was  no  evidence  of  specific  acoustic  marking  of  grammatical  corrections  in 
maternal  recasts,  the  global  acoustic  features  of  maternal  recasts  differed 
significantly  from  those  of  topic  continuations.  Specifically,  maternal  recasts  were 
characterized  by  expanded  contours  and  higher  peaks  than  maternal  topic 
continuations.  In  evaluating  the  significance  of  this  effect,  it  is  important  to  note 
that  these  mothers  are  already  using  expanded  Fq  contours  in  their  topic 
continuations  (average  Fq  range  = 364Hz;  cf.  Gamica,  1987  for  F„  ranges  in  AD 
speech).  The  acoustic  marking  of  maternal  recasts  appears  to  constitute  a further 
expansion  to  emphasize  the  grammatical  modification.  This  may  serve  to  facilitate 
speech  perception  by  identifying  new  phrases  and  grammatical  morphemes. 

It  seems  clear  that  the  linguistic  content  of  utterances  to  infants  and  young 
children  enjoys  special  status  with  regard  to  prosodic  marking.  Maternal  prosody 
highlights  important  lexical  units  and  grammatical  feedback.  Given  the  potential 
of  prosodic  marking  to  inform  young  language-learners,  it  is  necessary  to 
demonstrate  their  discrimination  of  prosodic  contrasts  and  a resulting  facilitation 
of  speech  processing  and  comprehension. 

Perception  of  Prosodic  Markers 

One  of  the  earliest  ways  in  which  ID  speech  might  facilitate  language-learning 
is  by  perceptually  dividing  the  speech  stream  into  important  syntactic  units  such 
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as  clauses.  Experiments  have  used  an  infant  preference  procedure  to  determine 
whether  infants  are,  in  fact,  sensitive  to  the  clausal  structure  of  utterances  in  ID 
and  AD  speech.  Hirsh-Pasek,  Kemler  Nelson,  Jusczyk,  Cassidy,  Druss,  and 
Kennedy  (1987)  investigated  the  perceptual  sensitivity  of  7-  to  10-month-old 
infants  to  the  acoustic  correlates  of  clausal  structure  in  English.  "Natural"  and 
"Unnatural"  speech  segments  were  developed  by  inserting  1 sec  pauses  in 
previously  recorded  maternal  utterances.  The  Natural  versions  had  pauses 
inserted  at  clause  boundaries  while  the  Unnatural  versions  had  pauses  inserted 
between  words  in  the  middle  of  clauses.  Infant  preference  was  assessed  by 
measuring  the  direction  and  duration  of  head-turns  to  the  "Natural"  versus  the 
"Unnatural"  stimuli.  Infants  did  not  orient  preferentially  in  the  direction  of  the 
Natural  samples  but  they  did  orient  longer  to  the  Natural  than  to  the  Unnatural 
speech  samples. 

To  determine  whether  infants  are  also  sensitive  to  prosodic  cues  in  AD 
speech,  Kemler  Nelson,  Hirsh-Pasek,  Jusczyk,  and  Cassidy  (1989)  developed 
Natural  and  Unnatural  versions  of  both  ID  and  AD  speech.  Infants  oriented  longer 
to  the  Natural  versions  of  ID  speech  than  to  the  Unnatural  versions  but  did  not 
evidence  a preference  for  either  set  of  speech  samples  in  AD  speech.  These 
studies  suggest  that  infants  are  sensitive  to  segmentation  cues  but  only  when 
they  occur  in  ID  speech.  While  the  precise  cues  leading  to  perceptual 
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segmentation  remain  unclear,  this  research  does  indicate  that  ID  speech  may  be 
an  important  vehicle  for  making  these  cues  salient  to  prelinguistic  infants. 

Given  that  infants  are  sensitive  to  features  of  linguistic  input  which  correspond 
to  syntactic  units,  to  what  extent  is  their  processing  of  speech  facilitated?  Mandel, 
Jusczyk  and  Kemler  Nelson  (1993)  tested  the  hypothesis  that  the  prosodic 
organization  of  speech  input  would  enhance  infants’  memory  for  phonetic 
information.  Two-month-olds  heard  a set  of  words  read  either  as  sentences  or  as 
words  in  a list.  In  addition,  sentences  or  words  differed  from  each  other  by  either 
one  or  two  phonemes.  For  example,  “rat  chased"  versus  "cat  chased"  represents 
a difference  of  one  phoneme  while  "rat  chased"  versus  "cat  raced"  represents  a 
difference  of  two  phonemes. 

Infants  were  tested  using  a high-amplitude-sucking  procedure  in  which 
sucking  above  a specified  criterion  produced  either  a single  sentence  or  a list 
sequence.  Habituation  to  the  target  stimulus  was  followed  by  a two-minute  silent 
inten/al  during  which  sucking  did  not  produce  a stimulus.  At  the  end  of  this 
period,  infants  heard  the  habituation  stimulus  again,  a stimulus  of  the  same  type 
(either  sentence  or  list)  which  differed  by  one  phoneme,  or  a stimulus  of  the  same 
type  which  differed  by  two  phonemes.  Recovery  from  habituation  was  greatest 
for  infants  who  received  sentence,  as  opposed  to  list,  presentation  and  for  those 
infants  in  the  single,  as  opposed  to  the  double,  phonemic  change  condition. 
These  effects  were  interpreted  as  evidence  for  the  role  of  global,  sentential 
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prosody  in  facilitating  speech  processing  in  infancy.  However,  the  absence  of  a 
nonlinguistic  control  condition  renders  this  interpretation  tenuous.  What  has  been 
demonstrated  in  these  discrimination  experiments  is  that  prelinguistic  infants  are 
sensitive  to  cues  (perhaps  nonlinguistic,  acoustic  cues)  that  correspond  to 
linguistic  structure.  It  is  unclear  at  what  point  in  development  one  can  begin  to 
speak  of  the  discrimination  of  these  cues  as  "linguistic"  versus  general  auditory 
processing.  It  is  certainly  reasonable  to  expect  that  the  ability  to  process  these 
cues  predisposes  infants  to  attend  to  linguistic  structure,  although  perhaps  at  a 
later  point  in  development. 

Recently,  Fernald,  McRoberts,  and  Herrera  (1994)  presented  evidence  that 
language  comprehension  of  older  infants  is  facilitated  by  ID  speech.  Fernald  et 
al.  used  a preferential  looking  procedure  to  assess  infants’  comprehension  of 
familiar  nouns  presented  in  sentence-final  position  in  ID  and  AD  speech.  Visual 
stimuli  consisted  of  pictures  representing  the  target  noun  and  a distractor. 
Fifteen-month-old  infants  looked  longer  to  the  target  noun  when  its  label  was 
presented  in  ID  rather  than  AD  speech.  Words  presented  in  sentence-final 
position  have  a distinct  advantage  when  it  comes  to  prosodic  marking.  They  are 
more  likely  than  other  words  in  the  sentence  to  be  placed  on  the  F^  peak  and 
they  are  likely  to  be  of  greater  duration.  Subsequent  experiments  demonstrated 
that  15-month-olds  better  recognized  words  in  sentence-final  than  in  sentence- 


28 

medial  position  and  that  this  effect  was  due  to  the  increased  duration  of  sentence- 
final  nouns. 

Two  types  of  evidence  with  regard  to  the  role  of  ID  speech  in  the  language 
acquisition  process  have  been  presented.  First,  mothers  appear  to  acoustically 
mark  important  lexical  information  (e.g.,  novel  words,  content  words,  and 
grammatical  corrections).  Second,  infants  are  sensitive  to  acoustic  variables 
marking  syntactic  units  (e.g.,  clauses)  and  lexical  units  (e.g.,  familiar  nouns)  and 
their  processing  of  these  units  appears  facilitated  as  a result.  However,  whether 
this  facilitation  constitutes  linguistic  or  general  auditory  processing  has  yet  to  be 
addressed.  More  germane  to  the  present  paper,  since  prosody  and  vocal 
paralanguage  exploit  the  same  basic  set  of  acoustic  variables  (e.g.,  F^,  intensity, 
and  duration),  to  what  extent  is  the  processing  of  these  variables  prosodic  as 
opposed  to  paralinguistic? 

Fernald  (1991,  1992)  proposed  a four-stage  developmental  sequence  of 
prosodic  and  paralinguistic  processing  during  the  first  year  of  life.  First,  the 
exaggerated  acoustic  contrasts  of  ID  speech  have  the  ability  to  affect  the  neonate 
directly  (e.g.,  abrupt,  high-pitch  sounds  initiate  a different  instrumental  response 
than  slow,  rhythmic,  low-pitch  sounds.)  The  initial  response  of  the  prelinguistic 
infant  to  these  acoustic  variations  is  primarily  affective.  That  is,  processing  priority 
should  favor  paralanguage  very  early  in  development.  At  the  second  stage, 
infants  continue  to  respond  affectively  to  acoustic  contrasts  in  speech,  but  their 
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responses  become  increasingly  differentiated  as  they  discover  the  covariation  of 
maternal  intent  with  the  acoustic  signal.  In  the  third  stage,  infants  develop 
associations  between  acoustically  emphasized  lexical  units  and  external  referents. 
(Presumably,  a shift  toward  greater  prosodic  processing  of  acoustic  input  occurs 
just  prior  to,  or  early  in,  this  stage.)  Finally,  in  the  fourth  stage,  young  language- 
learners  acquire  the  arbitrary  sound-meaning  correspondences  which  are 
heralded  as  the  first  evidence  of  receptive  language. 

The  present  research  adopts  this  model  as  a starting  point  in  describing  the 
developmental  relation  between  prosodic  and  paralinguistic  processing  and 
assumes  further  that  processing  priority  favors  the  prosodic  function  of  specific 
acoustic  variables  (e.g.,  Fq)  into  the  fourth  year  of  life.  This  corresponds  to  the 
latest  point  in  development  at  which  children  continue  to  be  exposed  to  the 
exaggerated  intonation  of  ID  speech.  It  also  follows  the  naming  explosion  that 
occurs  at  approximately  18  months  and  the  burst  in  grammatical  development 
that  occurs  between  two  and  three  years  of  age  (Bates  & Marchman,  1988).  By 
four  years  of  age,  children  have  made  remarkable  progress  in  language 
acquisition. 

From  approximately  1 8 months  to  at  least  4 years  of  age,  certain  acoustic 
cues  in  the  speech  signal  are  presumed  to  play  a primarily  prosodic  role.  The 
paralinguistic  role  of  these  variables  is  presumed  to  be  secondary  to  their 
prosodic  role,  particularly  in  discrepant  messages  in  which  the  prosodically 
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highlighted  iexical  content  conflicts  with  the  meaning  conveyed  by  paralanguage. 
Importantly,  this  constraint  does  not  apply  to  configurations  of  acoustic  variables 
(e.g.,  voice  quality)  which  are  not  known  to  contribute  in  any  substantial  way  to 
prosodic  processing.  Instead,  the  bias  toward  prosodic  processing  of  other 
acoustic  variables  results  in  a directing  of  attention  to  lexical,  rather  than 
paralinguistic,  sources  of  information.  Processing  priority  for  prosody  over 
paralanguage  is  expected  to  result  in  a linguistic  constraint  on  the  interpretation 
of  affectively  discrepant  auditory  messages.  However,  children  in  this  age  range 
are  also  subject  to  a cognitive  constraint  on  competing  representations  that  may 
act  in  conjunction  with  the  linguistic  constraint  to  influence  interpretations  of 
discrepancy. 

The  Dual  Representation  Problem 

The  dual  representation  problem,  or  the  inability  to  represent  an  event  in  more 
than  one  way  simultaneously,  may  lead  children  to  focus  on  a single  source  of 
information  when  two  sources  (e.g.,  lexical  and  paralinguistic)  conflict.  This 
limitation  may  pose  a cognitive  constraint  on  interpretations  of  affective 
discrepancy.  It  is  well  documented  that  3-  to  4-year-olds  have  difficulty 
understanding  that  "reality"  can  be  represented  in  more  than  one  way  (Flavell, 
1988).  Children’s  representational  ability  is  assessed  by  means  of  an 
Appearance-Reality  (AR)  task.  Although  the  AR  task  has  been  extended  to  a 
number  of  physical  domains  (i.e.,  visual,  auditory,  olfactory,  and  tactile),  the 
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standard  task  is  visual  (Flavell,  Green,  & Flavell,  1986).  In  one  version,  children 
are  shown  cardboard  cutouts  of  animal  shapes.  The  cutouts  are  placed  behind 
a filter  which  changes  their  color  and  the  children  are  asked  what  color  the  animal 
appears  to  be  when  it  is  behind  the  filter  and  what  color  it  is  "really  and  truly." 
Six-  and  7-year-olds  respond  with  the  "apparent"  color  of  an  object  when  asked 
the  appearance  question  and  with  the  "real"  color  of  the  object  when  asked  the 
reality  question,  but  3-  and  4-year-olds  do  not  show  consistently  correct 
performance  (Flavell,  1986). 

Several  researchers  have  extended  these  findings  to  the  domain  of  emotions. 
At  issue  is  whether  an  understanding  of  AR  distinctions  is  evident  across  physical 
and  affective  domains  at  about  the  same  point  in  development.  Harris  et  al., 
(1 986)  suggest  that,  to  the  extent  that  both  affective  and  physical  AR  tap  the  same 
basic  cognitive  processes,  some  correspondence  between  performance  in  these 
domains  can  be  expected.  Children’s  understanding  of  the  AR  distinction  in  the 
affective  domain  has  been  assessed  using  a display  rules  paradigm.  This 
involves  telling  children  a story  about  a situation  in  which  the  protagonist  would 
probably  feel  one  emotion  but  would  have  reason  to  appear  to  be  feeling 
something  else  (Gnepp,  1983;  Gnepp  & Hess,  1986;  Harris  et  al.,  1986;  Harris  & 
Gross,  1988;  Saarni,  1979).  Children  are  asked  how  the  protagonist  feels  and 
looks,  and  why  he  feels  and  looks  that  way.  Studies  using  this  paradigm  indicate 
that  children  1 0-years-old  and  older  have  a more  sophisticated  understanding  of 
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affective  AR  than  younger  school-aged  children  (Gnepp,  1983;  Gnepp  & Hess, 
1 986;  Saami,  1 979).  However,  even  4-year-olds  demonstrate  some  understanding 
of  affective  AR  when  they  are  oriented  to  relevant  cues  in  the  stories  during 
questioning  (Harris  et  al.,  1986). 

Two  recent  within-subjects  comparisons  of  physical  and  affective  AR  have 
been  conducted  to  address  more  clearly  the  issue  of  domain-specificity.  Gross 
(1 989)  tested  3-  and  4-year-olds  on  modified  versions  of  physical  and  affective 
tasks.  In  the  physical  task,  children  saw  pens  whose  caps  did  not  match  the 
color  of  the  ink.  Each  pen  appeared  to  be  one  color  (the  color  of  the  cap)  but 
was  really  a different  color.  In  the  affective  task,  children  were  shown  dolls  who 
wore  masks  (apparent  emotion)  over  their  faces  (real  emotion).  Both  3-and  4- 
year-olds  made  physical  and  affective  AR  distinctions  suggesting  that  the  AR 
distinction  is  present  early  in  development  and  that  it  is  domain-general. 

However,  understanding  affective  AR  requires  more  than  detecting 
discrepancies  between  successive  facial  expressions.  By  definition,  facial 
expression  is  apparent  emotion  which  does  not  occur  in  isolation.  Children  also 
have  access  to  situational  information  (e.g.,  Billy  was  opening  a disappointing 
birthday  present  when  he  smiled.)  To  discern  Billy’s  real  and  apparent  emotions, 
the  child  must  infer  his  real  emotion  from  the  situational  context  and  identify  his 
apparent  emotion  from  his  facial  expression.  Identifying  facial  expression  in  the 
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absence  of  situational  context  may  not  reflect  a recognition  of  simultaneous,  yet 
discrepant  emotions  (one  apparent,  one  real)  that  is  central  to  the  AR  distinction. 

To  address  this  issue  directly.  Friend  and  Davis  (1993)  conducted  a within- 
subjects  comparison  of  physical  and  affective  AR  in  which  the  perceptual 
availability  of  real  and  apparent  emotions  in  the  affective  task  was  systematically 
varied  along  a dimension  of  similarity  to  the  perceptual  availability  of  real  and 
apparent  colors  in  the  standard  physical  task.  The  less  similar  the  two  tasks,  the 
more  the  affective  task  approached  the  kind  of  AR  problem  children  could  be 
expected  to  encounter  in  a naturalistic  setting.  When  the  two  tasks  were  similar, 
individual  children  performed  about  equally  well  on  each  task.  As  the  affective 
task  required  more  inference  (e.g.,  using  situational  information  to  infer  the  real 
emotion),  children’s  performance  declined  relative  to  their  performance  on 
physical  AR.  Across  levels  of  task  difficulty,  children’s  ability  to  reflect  upon 
representations  of  simultaneously  discrepant  emotions  increased  dramatically 
between  4 and  6 years  of  age. 

Research  on  children’s  understanding  of  the  AR  problem  in  general,  and  of 
affective  AR  in  particular,  provides  the  foundation  for  the  proposed  cognitive 
constraint  on  children’s  interpretations  of  discrepancy.  In  situations  in  which  a 
discrepancy  exists  between  lexical  and  paralinguistic  cues,  it  is  reasonable  to 
expect  that  children  under  5 to  6 years  of  age  would  focus  on  a single  source  of 
affective  information  (either  lexical  or  paralinguistic)  to  make  their  judgments. 
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However,  this  constraint  alone  does  not  predict  which  source  of  information  would 
provide  the  basis  for  children’s  judgments.  Given  the  influence  of  the  linguistic 
constraint,  children  who  cannot  entertain  two  simultaneous  representations  of 
affectively  discrepant  messages  should  be  biased  toward  reporting  the  affect 
consistent  with  lexical,  but  not  with  paralinguistic,  cues. 

The  purpose  of  the  present  experiment  is  to  describe  developmental  changes 
in  paralinguistic  sensitivity  as  a function  of  the  proposed  linguistic  and  cognitive 
constraints,  although  paralinguistic  sensitivity  is  probably  also  influenced  by 
social-cognitive  mechanisms  (particularly  in  cases  of  discrepancy).  In  this 
experiment,  children  were  asked  to  rate  speaker  affect  in  three  speech  contexts. 
In  the  first  context,  children  heard  unaltered  tape-recordings  of  a woman  reading 
sentences  in  positive  and  negative  affects  (half  consistent  with  lexical  content  and 
half  discrepant).  In  this  full  speech  context,  all  lexical,  prosodic,  and  paralinguistic 
information  remained  intact.  This  condition  serves  as  a test  of  the  combined 
effect  of  the  linguistic  and  cognitive  constraints  on  children’s  paralinguistic 
sensitivity.  The  developmental  duration  of  the  linguistic  constraint  is  an  open 
question;  however,  the  extent  of  the  cognitive  constraint  has  been  empirically 
established.  Across  a wide  range  of  measures,  5-  to  6-year-olds  demonstrate  an 
ability  to  consider  alternative  representations  simultaneously.  Consequently,  the 
4-year-old  sample  was  expected  to  be  constrained  in  their  interpretations  of 
discrepant  auditory  messages  by  restrictions  on  the  role  of  acoustic  variables  and 


35 

on  the  number  of  conflicting  stimulus  attributes  that  could  be  simultaneously 
represented.  As  a result,  4-year-olds  were  expected  to  show  minimal  sensitivity 
to  paralinguistic  cues  and  to  base  their  judgments  on  lexical  content.  Seven-  and 
10-year-olds,  in  contrast,  were  expected  to  show  a significant  increment  in 
sensitivity  to  paralinguistic  cues  even  in  the  presence  of  conflicting  lexical  content. 

In  the  second  context,  children  heard  utterances  which  were  low-pass  filtered. 
The  result  of  this  procedure  is  to  minimize  lexical  content  while  preserving  and 
durational  information.  This  condition  provides  a test  of  the  linguistic  constraint 
alone  for  predicting  paralinguistic  sensitivity  to  one  acoustic  variable  (FJ 
correlated  with  both  prosody  and  paralanguage.  If  the  linguistic  constraint 
influences  paralinguistic  sensitivity  independently  of  the  cognitive  constraint,  4- 
year-olds  should  evidence  chance  performance  in  detecting  affective  variations 
in  filtered  speech  because  it  requires  that  they  make  affective  judgments  on  the 
basis  of  Fq  alone.  In  contrast,  7-  and  10-year-olds  should  show  a significant 
increment  in  performance  over  4-year-olds.  Specifically,  the  prosodic  function  of 
Fq  was  expected  to  be  more  salient  to  young,  language-learning  children  than  was 
the  paralinguistic  function.  Four-year-olds  represent  a point  early  in  the 
theoretical  transition  from  the  processing  of  acoustic  variables  as  prosody  to  a 
more  flexible  processing  of  these  cues  as  both  prosody  and  paralanguage. 
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In  the  third  context,  the  utterances  were  produced  by  reiterant  speech.  This 
involves  replacing  the  original  syllables  of  an  utterance  with  nonsense  syllables. 
The  effect  of  this  manipulation  is  to  remove  meaningful  lexical  information  while 
leaving  prosodic  and  paralinguistic  variables  intact  (Friend,  1991 ; Friend  & Farrar, 
in  press;  Liberman  & Streeter,  1978;  Nakatani  & Schaffer,  1977).  Because  this 
condition  preserves  the  full  range  of  paralinguistic  cues  (including  voice  quality 
information,  the  function  of  which  is  more  likely  paralinguistic  than  prosodic),  it 
provides  a baseline  of  children’s  paralinguistic  sensitivity  in  the  absence  of 
competing  lexical  information.  Sensitivity  to  the  paralinguistic  content  of  these 
utterances  is  expected  to  improve  with  age,  however,  all  groups  were  expected 
to  perform  relatively  well  in  this  condition  due  to  the  presence  of  configurations 
of  acoustic  variables  whose  function  is  primarily  paralinguistic  and  the  absence 
of  competing  lexical  content. 

Summary  of  Predictions 

In  summary,  4-year-olds  were  expected  to  perform  below  chance  on 
discrepant  full  speech  due  to  the  proposed  cognitive  and  linguistic  constraints. 
That  is,  4-year-olds  were  expected  to  make  affective  attributions  on  the  basis  of 
lexical,  as  opposed  to  paralinguistic,  information.  The  role  of  the  linguistic 
constraint  was  expected  to  be  corroborated  in  4-year-olds’  poor  performance  on 
consistent  and  discrepant  filtered  speech  in  which  the  primary  paralinguistic  cue 
available  is  F„.  (Recall  that  Fq  is  correlated  with  both  prosody  and  paralanguage.) 
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In  contrast  to  their  performance  on  discrepant  full  speech  and  consistent  and 
discrepant  filtered  speech,  4-year-olds  were  expected  to  perform  relatively  well  on 
reiterant  speech  because  it  preserves  rich  voice  quality  information  without 
competing,  meaningful,  lexical  content. 

Seven-year-olds  were  expected  to  demonstrate  increased  sensitivity,  relative 
to  4-year-olds,  to  paralinguistic  cues  in  full  discrepant  speech  due  to  their  ability 
to  consider  discrepant,  simultaneous  representations.  While  the  linguistic 
constraint  is  clearly  expected  to  play  an  important  role  in  the  detection  of  speaker 
affect  by  4-year-olds,  it  is  less  clear  whether  this  constraint  would  continue  to 
influence  performance  at  age  7.  Seven-year-olds’  improved  performance  on 
discrepant  full  speech  is  expected  to  be  accompanied  by  a significant 
improvement  in  filtered  speech.  In  addition,  there  should  be  an  increment  in 
performance  on  reiterant  speech  reflecting  a general  developmental  improvement 
in  sensitivity  to  vocal  affect. 

In  general,  it  was  expected  that  developmental  progressions  observed 
between  4 and  7 years  of  age  would  also  be  evident  at  1 0 years  of  age.  Based 
on  previous  research,  it  was  expected  that  1 0-year-olds  would  be  more  sensitive 
than  7-year-olds  to  the  paralinguistic  content  of  discrepant  full  speech.  A 
corresponding  improvement  between  ages  7 and  10  was  expected  for 
performance  on  filtered  speech  stimuli. 


CHAPTER  3 
METHODS 

Subjects 

Listeners  were  90  female  children  recruited  from  University  housing  and  from 
local  schools  and  day-care  centers.  Only  females  listeners  were  employed  in  the 
interest  of  establishing  the  presence  of  a genuine  developmental  phenomenon 
prior  to  investigating  potential  gender  differences.  All  children  were  native 
speakers  of  American  English  and  had  normal  hearing  as  assessed  by  parental 
report.  Each  child  received  an  honorarium  of  $5.00  for  research  participation. 
Data  from  six  children  were  not  included  in  analyses  because  their  ages  fell 
outside  the  desired  age  ranges  for  the  current  study.  There  were  28  children  in 
each  of  three  age  groups:  4-year-olds  (M  = 4;7,  range  4;1  to  4;1 1),  7-year-olds 
(M  = 7;7,  range  7;0  to  8;1)  and  10-year-olds  (M  = 10;8,  range  10;1  to  11;10). 

Stimulus  Development 
Full  Speech  and  Reiterant  Stimuli 

Eighteen  sentences  were  taken  from  transcripts  of  mothers  talking  to  their 
children.  The  semantic  content  of  these  sentences  had  been  rated  by  adults  and 
by  7-  to  9-year-old  children  in  a previous  study  as  representing  either  happiness 
or  anger  at  an  inter-rater  agreement  of  .80  (Friend  & Becker,  1987,  1994).  Half 
of  the  sentences  had  been  rated  as  happy  and  half  had  been  rated  as  angry.  For 
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the  purposes  of  the  current  study,  an  additional  five  sentences  were  selected 
which  had  previously  been  rated  as  affectively  neutral  (D.A.  Bugental,  personal 
communication,  October  17,  1984).  Each  of  the  sentences  was  written  on  a 5"X 
7"  index  card  to  be  read  by  the  experimenter. 

The  original  and  reiterant  stimuli  were  generated  over  a two-day  period  by  the 
experimenter,  who  is  female  and  a native  speaker  of  American  English.  On  the 
first  day,  all  of  the  happy  and  angry  sentences  were  read  in  a positive/happy 
voice  and  on  the  second  day,  all  of  the  happy  and  angry  sentences  were  read  in 
a negative/angry  voice.  The  neutral  sentences,  which  were  intended  as  practice 
stimuli  for  the  observers,  were  read  in  a neutral  voice  at  the  beginning  of  the  first 
recording  session.  The  stimuli  were  read  into  a Shure  Model  SM81  condenser 
microphone  and  recorded  onto  Maxell  UD  35-90  reel-to-reel  tape  using  a Teac 
Model  A-3300SX  laboratory  tape  recorder.  The  microphone  was  affixed  to  the 
experimenter’s  head  at  a constant  distance  of  15.3  cm  from  the  experimenter’s 
mouth.  This  was  done  to  control  for  artifactual  changes  in  stimulus  amplitude 
(e.g.,  head  movement). 

A Visi-Pitch  Model  6087  DS  analog  pitch  extracting  device  was  used  to 
provide  measures  of  F„  and  relative  amplitude.  As  each  sentence  was  read,  Fq 
was  plotted  as  a function  of  time  on  the  screen  of  the  Visi-Pitch.  Following  the 
reading  of  each  sentence,  a reiterant  stimuius  was  generated.  The  reiterant 
stimulus  was  produced  by  replacing  each  syllable  of  the  original  sentence  with 
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one  of  three  meaningless  syllables  (ma,  ba,  or  sa).  The  basis  for  choosing  one 
of  these  syllables  to  replace  a syllable  in  the  original  sentence  was  the  similarity 
of  Fo  contours  for  the  original  and  reiterant  syllables.  A list  of  these  syllables  as 
a function  of  the  first  letter  of  the  replaced  syllable  is  presented  in  the  Appendix. 

The  purpose  of  using  reiterant  speech  was  to  create  a stimulus  as  acoustically 
similar  to  the  original  stimulus  as  possible  without  the  use  of  meaningful  semantic 
content.  The  determination  of  acoustic  similarity  was  made  in  two  stages.  First, 
a plot  of  the  F^  contour  for  the  reiterant  stimulus  was  obtained  on  the  screen  of 
the  Visi-Pitch  below  the  Fq  contour  of  the  original  stimulus.  Several  original- 
reiterant  stimulus  pairs  were  generated  for  each  stimulus  sentence.  Figure  3-1 
provides  the  F^  contour  for  the  sentence,  "You’re  my  favorite  person",  read  in 
happy  intended  affect  and  for  its  reiterant  counterpart,  " ma  ma  sasama  basa." 

Next,  the  acoustic  measures  Fq  mean,  Fp  range,  dB  mean,  and  dB  standard 
deviation,  (relative  dB  measures  obtained  for  Fp  only),  were  taken  for  each 
member  of  those  pairs  of  original  and  reiterant  stimuli  which  had  the  most  similar 
Fp  contours.  Difference  scores  were  calculated  for  these  measures  and 
transformed  into  z-scores.  The  criterion  for  acoustic  similarity  was  z=  1 1 .5 1 for 
each  difference  variable.  Thus,  an  original-reiterant  stimulus  pair  was  chosen  for 
inclusion  in  the  study  if  two  criteria  were  met:  1)  both  stimuli  produced  visually 
similar  Fp  contours,  and  2)  the  stimuli  differed  in  their  values  on  four  acoustic 
variables  by  less  than  1 .5  standard  deviations  of  the  distribution  of  difference 
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scores.  The  average  values  of  the  difference  variables  and  their  standard 
deviations  are  presented  in  Table  3-1.  These  criteria  were  met  for  12  stimulus 
pairs  (6  happy  and  6 angry).  The  final  stimulus  set  consisted  of  six  happy 
sentences  read  in  both  happy  and  angry  voice,  six  angry  sentences  read  in  both 
happy  and  angry  voice,  and  three  neutral  sentences  read  in  neutral  voice. 

Stimulus  digitization  and  filtering  were  conducted  at  Haskins’  Laboratories  and 
the  Speech  Perception  Laboratory  at  the  University  of  South  Florida.  The  original 
and  reiterant  stimuli  were  digitized  at  a sampling  rate  of  20kHz  using  a 12-bit 
analog  to  digital  converter  and  high-frequency  preemphasis  (Whalen,  Wiley,  Rubin 
& Cooper,  1990).  A waveform  editing  program  was  used  to  remove  any  noise 
from  the  beginnings  and  ends  of  the  stimuli. 

Low-Pass  Filtered  Stimuli 

The  digitized  original  stimuli  were  low-pass  filtered  at  400Hz  using  cascading 
ILS  software  filters.  Filters  were  two  Butterworth  4-pole  filters,  each  with  a roll-off 
of  24dB/octave  resulting  in  a total  roll-off  of  48dB/octave.  That  is,  for  every  octave 
increase  in  frequency  above  400Hz,  there  was  a decrease  in  amplitude  of  48 
decibels.  A cutoff  frequency  of  400Hz  was  chosen  because  pilot  data  indicated 
that  observers  could  understand  some  of  the  words  in  stimuli  for  which  a higher 
cutoff  frequency  had  been  selected.  The  15  stimuli  generated  by  this  procedure 
have  the  sound  of  muffled  speech,  not  unlike  the  sound  of  someone  speaking 
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behind  a closed  door.  Stimuli  were  recorded  onto  Ampex  Grand  Master  reel-to- 
reel  tape  using  a Teac  Model  A-3300SX  laboratory  tape  recorder. 

Procedure 

Three  stimulus  sets  (reiterant,  low-pass  filtered,  full)  were  presented  to 
listeners.  Each  set  consisted  of  18  stimuli  ranging  from  1060.4  ms  to  2276.0  ms 
in  duration.  The  first  three  and  last  three  stimuli  in  each  set  were  read  in  a neutral 
voice  and  were  included  to  provide  listeners  with  a baseline  of  the  speaker’s  voice 
in  each  condition.  Of  the  12  remaining  stimuli,  6 were  consistent  utterances  (3 
were  happy  lexical  content  read  in  happy  voice,  3 were  angry  lexical  content  read 
in  angry  voice)  and  6 were  discrepant  utterances  (3  were  happy  lexical  content 
read  in  angry  voice,  and  3 were  angry  lexical  content  read  in  happy  voice.)  The 
vocal  affect  of  the  reiterant  versions  of  the  stimuli  had  been  rated  by  adults  in  a 
previous  experiment  at  inter-rater  agreements  of  .75  to  .95  (M=.88)  (Friend,  1 991 ; 
Friend  & Farrar,  1994).  Within  each  set,  the  stimuli  were  arranged  in  two  orders. 
One  order  was  random  with  the  restriction  that  no  more  than  two  stimuli 
representing  the  same  vocal  affect  (happy  or  angry)  occurred  in  succession.  The 
second  order  was  the  reverse  of  the  random  order.  (See  Table  3-2  for  a list  of  the 
stimuli.) 

Stimulus  sets  were  presented  in  each  of  six  possible  orders  (filtered,  reiterant, 
full;  reiterant,  filtered,  full;  full,  filtered,  reiterant;  etc.).  Within  each  stimulus  set, 
listeners  heard  both  the  random  and  reverse  orders  for  a total  of  36  presentations. 
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These  orders  were  counterbalanced  across  listeners  so  that  one  listener  heard  the 
stimuli  in  the  random  and  then  the  reverse  order,  for  example,  and  another 
listener  heard  the  stimuli  in  reverse  order  and  then  random  order. 

The  stimuli  were  presented  in  a single-walled  sound-attenuating  chamber 
(Tracoustics)  using  a Teac  Tascam  Model  32  reel-to-reel  tape  deck  and 
Beyerdynamic  DT320  mkll  stereo  headphones.  The  playback  level  for  each  set 
of  stimuli  was  adjusted  to  maximize  the  similarity  of  stimulus  intensities  across 
sets.  Estimates  of  stimulus  intensity  were  obtained  with  a Fluke  801 OA  digital 
multimeter  and  converted  to  decibels  using  a reference  calibration  tone  of 
1000Hz.  Peak  stimulus  intensities  (in  dB  gpj  at  the  adjusted  playback  levels  are 
presented  in  Table  3-3. 

Using  a single-interval  two-alternative  forced-choice  procedure,  listeners  were 
asked  to  rate  whether  the  woman’s  voice  sounded  "happy"  or  "mad"  after  each 
stimulus  presentation  by  pointing  to  one  of  two  schematic  faces  representing 
these  emotions.  Children  were  told,  "On  this  tape  you  will  hear  recordings  of  a 
mother  talking  to  her  little  girl  in  a toy  store.  Sometimes  when  she  is  talking  she 
feels  happy  and  sometimes  she  feels  mad.  Your  job  is  to  tell  me  when  she  feels 
happy  and  when  she  feels  mad  by  pointing  to  one  of  these  faces.  Sometimes  it 
will  be  difficult  to  understand  what  she’s  saying  and  that’s  okay.  The  words  are 
not  really  important.  What’s  really  important  is  how  she  sounds."  These 
instructions  encouraged  children  toward  optimal  utilization  of  paralinguistic  cues. 
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In  essence,  the  instructions  were  designed  to  bias  4-year-olds’  performance 
against  the  predictions  of  the  study. 

After  each  of  the  first  three  stimulus  presentations,  the  tape  was  stopped  and 
the  child  was  asked,  "How  does  she  feel?  Does  she  feel  happy  or  does  she  feel 
mad?"  Then  the  child  was  told  that  the  tape  would  be  left  running  and  to  point 
to  the  face  that  was  most  like  how  the  mother  felt  each  time  they  heard  an 
utterance.  If  the  child  became  confused  about  a stimulus,  the  tape  was  stopped 
and  the  question  repeated.  If  the  child  still  found  it  difficult  to  decide,  they  were 
asked  a second  question,  "Does  she  feel  more  happy  or  more  mad?"  If  listeners 
made  more  than  one  response  to  a stimulus,  they  were  asked,  "Does  she  feel 
happy  or  mad?"  and  their  final  response  was  recorded.  After  each  set  of  1 8 
presentations  (random  or  reversed),  the  experimenter  prepared  the  tape  for  the 
next  set  of  stimuli  while  the  child  made  a picture  from  construction  paper  and 


stickers. 
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Table  3-1 

Difference  Score  Means  (and  Standard  Deviations)  for 
Reiterant-Oriainal  Stimulus  Pairs 


Mean  difference  score 
Eo  ^ 


Mean 

Ranoe 

Mean 

6.8Hz 

7.5Hz 

3.3 

0.8 

(5.8Hz) 

(4.8Hz) 

(1.9) 

(0.4) 

Note.  Difference  scores  correspond  to  |z|  = 1 .5  for  each  measure. 


Table  3-2 


Lexical  and  Paralinauistic  Content  of  Full  Speech  Stimuli 


Item  Lexical  Content  Paralinauistic  Content 

1. )  That’s  just  a mirror  neutral 

2. )  I think  tools  are  essential  neutral 

3. )  This  is  called  burlap  neutral 

4. )  You  play  very  well  angry 

5. )  You’ll  never  behave  yourself  angry 

6. )  Don’t  play  around  with  me  child  happy 

7. )  Oh  good  you  got  them  all  happy 

8. )  You’re  my  favorite  person  angry 

9. )  You  play  very  well  happy 

1 0. )  You’re  being  punished,  you  stay  happy 

right  there 

11. )  Oh  good  you  got  them  all  angry 

12. )  Don’t  play  around  with  me  child  angry 

13. )  You’ll  never  behave  yourself  happy 

14. )  You’re  being  punished,  you  stay  angry 

right  there 

15. )  You’re  my  favorite  person  happy 

1 6. )  That’s  just  a mirror  neutral 

17. )  I think  tools  are  essential  neutral 

1 8. )  This  is  called  burlap  neutral 
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Table  3-3 

Peak  Stimulus  Intensity  Values  Hn  dB^p,)  as  a Function 
of  Condition  and  Acoustic  Valence 

Stimulus  (and  Valence)  Speech  Context 


Full 

Reiterant 

Filtered 

1 (neutral) 

61.7 

61.5 

65.3 

2 (neutral) 

59.2 

60.3 

62.6 

3 (neutral) 

57.5 

62.2 

60.5 

4 (negative) 

60.2 

65.5 

62.2 

5 (negative) 

68.3 

68.2 

70.0 

6 (positive) 

67.2 

70.1 

67.0 

7 (positive) 

70.1 

68.4 

65.5 

8 (negative) 

69.1 

68.3 

66.2 

9 (positive) 

71.9 

69.1 

69.9 

10  (positive) 

69.3 

67.7 

68.2 

11  (negative) 

68.4 

67.9 

63.9 

12  (negative) 

67.6 

67.5 

64.8 

13  (positive) 

69.1 

67.4 

68.3 

14  (negative) 

69.6 

68.5 

68.4 

15  (positive) 

71.1 

67.3 

68.2 
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Time  (in  msec) 


Figure  3-1 


Fundamental  Frequency  Contours  for  the  Sentence, 
"You’re  my  favorite  person," 

Read  in  Happy  Affective  Voice 


CHAPTER  4 
RESULTS 


Data  Reduction  and  Preliminary  Analysis 
All  analyses  and  planned  comparisons  were  conducted  at  a =.01.  Post  hoc 

tests  were  conducted  using  a Bonferroni  procedure  to  maintain  familywisea  = .01 . 
In  each  speech  context,  children  heard  two  types  of  consistent  stimuli,  (happy 

lexical  content/happy  voice,  angry  lexical  content/angry  voice)  and  two  types  of 
discrepant  stimuli  (angry  lexical  content/happy  voice,  happy  lexical  content/angry 
voice.)  To  determine  whether  responses  to  these  stimuli  could  be  collapsed  into 
fewer  categories,  post  hoc  comparisons  of  the  two  types  of  consistent  message 
and  the  two  types  of  discrepant  message  within  each  level  of  age  and  speech 
context  were  conducted.  Since  none  of  these  comparisons  were  significant, 
children  s responses  to  these  stimuli  were  collapsed  into  two  categories: 
consistent  and  discrepant. 

An  initial  Speech  Context  (3)  X Discrepancy  (2)  X Age  (3)  X Order  (6)  repeated 
measures  mixed  models  MANOVA  was  conducted  on  the  number  of  accurate 
detections  of  vocal  affect  out  of  a possible  12  (see  Table  4-1  for  summary 
statistics).  There  were  main  effects  of  Speech  Context,  Discrepancy,  and  Age  and 
several  significant  interactions.  Specifically,  there  were  two-way  interactions  of 
Age  X Order,  Speech  Context  X Age,  Speech  Context  X Order,  Speech  Context 
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X Discrepancy,  and  Discrepancy  X Age.  In  addition,  there  were  two  significant 
three-way  interactions:  Speech  Context  X Discrepancy  X Age  and  Speech 

Context  X Discrepancy  X Order. 

Three  sets  of  follow-up  analyses  were  conducted.  First,  planned  comparisons 
were  conducted  to  test  the  prediction  that  4-year-olds,  but  not  7-year-olds,  would 
perform  at  chance  on  filtered  speech  and  below  chance  on  discrepant,  full 
speech.  Second,  planned  comparisons  were  conducted  on  the  Speech  Context 
X Discrepancy  and  Speech  Context  X Discrepancy  X Age  interactions  because 
they  provide  tests  of  the  predictions  implied  by  the  cognitive  and  linguistic 
constraints  and  because  the  higher-order  interaction  subsumes  all  other 
substantive  main  effects  and  interactions.  Finally,  post  hoc  comparisons  of  the 
order  effects  follow  the  planned  comparisons. 

Before  conducting  planned  comparisons,  subsets  of  the  data  were  visually 
examined  to  determine  if  analysis  of  the  full  data  set  was  justified  given  the 
presence  of  order  effects.  Average  performance  was  plotted  within-subjects  for 
the  full  data  set  and  between-subjects  for  the  speech  context  presented  first. 
Visual  inspection  of  Figure  4-1  reveals  a similar  pattern  of  effects  across  both  the 
full  and  reduced  data  sets  with  the  exception  of  the  point  representing  7-year- 
olds’  performance  on  discrepant  full  speech.  An  analysis  of  individual  data 
revealed  that  this  was  because  2 of  the  9 listeners  contributing  to  that  point  were 
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sampled  from  the  low  end  of  the  distribution.  Removal  of  these  two  listeners 
resulted  in  a nearly  identical  pattern  across  the  full  and  reduced  data  sets. 

Because  these  plots  yielded  such  similar  patterns,  planned  comparisons  were 
conducted  on  the  more  powerful,  full  data  set  (see  Table  4-2  for  means  and 
standard  deviations;  also  refer  to  Figure  4-1).  All  hypotheses  were  supported. 
First,  to  test  whether  performance  in  the  filtered  and  discrepant  full  speech 
contexts  differed  from  chance,  within-subjects  t-tests  were  conducted  in  which 
chance  performance  (6  out  of  12  judgments  correct)  was  subtracted  from  actual 
performance.  Recall  that  the  combined  influence  of  the  cognitive  and  linguistic 
constraints  was  expected  to  lead  4-year-olds  to  focus  almost  exclusively  on 
information  conveyed  by  lexical  content,  resulting  in  below  chance  sensitivity  to 
paralinguistic  cues  in  discrepant  full  speech.  In  addition,  the  bias  toward  the 
prosodic  processing  of  F„  information  was  expected  to  result  in  chance 
performance  in  the  filtered  speech  context.  As  predicted,  4-year-olds’ 
performance  on  discrepant  full  speech  stimuli  was  significantly  below  chance 
(t(27)=  -3.36,  p<.01)  and  their  performance  in  the  filtered  context  did  not  differ 
from  chance.  An  interesting,  and  unanticipated,  finding  was  that  7-year-olds’ 
performance  on  discrepant  full  speech  did  not  differ  from  chance. 

Planned  comparisons  of  the  Speech  Context  X Discrepancy  interaction 
revealed  that  4-year-olds  (t(27)=8.32,  p<.01  andt(27)=8.19,  p<.01),  7-year-olds 
(t(27)=5.61,  p<.01  and  t(27) =3.54,  p<.01),  and  10-year-olds  (t(27)=6.47,  p<.01 
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and  t(27)=5.42,  B<.01)  were  more  sensitive  to  paralinguistic  cues  in  consistent 
and  discrepant  reiterant  speech  than  in  discrepant  full  speech.  This  is  consistent 
with  the  prediction  that  all  age  groups  would  perform  relatively  well  in  the  reiterant 
speech  context  due  to  the  fact  that  reiterant  speech  preserves  rich  paralinguistic 
information  but  no  lexical  content.  However,  the  Speech  Context  X Discrepancy 
X Age  interaction  makes  it  necessary  to  qualify  this  finding  with  regard  to  relative 
performance  in  the  filtered  speech  context. 

All  age  groups  performed  better  on  discrepant  reiterant  stimuli  than  on 
discrepant  filtered  stimuli  (4-year-olds’  t(27)=5.02,  p<.01 ; 7-year-olds’  t(27)=6.82, 
B<.01;  and  10-year-olds’  t(27)=5.00,  e<-01).  However,  only  4-year-olds  and  7- 
year-olds  performed  better  on  consistent  reiterant  stimuli  than  on  consistent 
filtered  stimuli  (t(27)=2.96,  ^<.01  and  t(27)=3.96,  e<.01).  This  may  indicate  a 
confound  between  lexical  and  paralinguistic  cues  in  filtered  speech.  While  low- 
pass  filtering  removes  most  segmental  information,  it  is  possible  to  understand 
some  lexical  content,  particularly  after  repeated  presentations.  The  presence  of 
a difference  between  discrepant  filtered  and  reiterant  speech  but  not  between 
consistent  filtered  and  reiterant  speech  for  1 0-year-olds  may  indicate  that  these 
listeners  were  responding  at  least  in  part  to  lexical  cues  in  their  judgments  of 
filtered  speech. 

Additional  comparisons  were  conducted  to  further  assess  predictions 
regarding  the  relative  performance  of  4-,  7-,  and  10-year-olds  across  speech 
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contexts.  In  general,  improvements  in  performance  were  observed  between  ages 
4 and  7 across  speech  contexts  and  levels  of  discrepancy  with  the  interaction  of 
speech  context  with  discrepancy  and  age  being  apparent  in  differences  in 
performance  from  age  7 to  age  1 0.  This  account  of  age-related  effects  will  begin 
with  changes  in  performance  between  4-  and  7-year-olds  and  conclude  with 
differences  in  performance  between  7-  and  1 0-year-olds  as  a function  of  speech 
context  and  discrepancy. 

As  predicted,  there  was  a significant  increment  in  sensitivity  to  the 
paralinguistic  content  of  discrepant  full  speech  from  4 to  7 years  of  age 
(t(54)=2.83,  E<-01).  This  corresponds  to  an  improved  ability  to  maintain 
simultaneously  discrepant  mental  representations  that  is  expected  to  occur 
between  4 and  5 years  of  age.  However,  as  mentioned  previously,  7-year-olds’ 
performance  in  this  context  was  still  not  above  chance,  suggesting  that  the 
linguistic  constraint  continued  to  bias  their  affective  judgments.  As  predicted  on 
the  basis  of  the  linguistic  constraint,  4-year-olds’  sensitivity  to  paralinguistic  cues 
in  both  consistent  and  discrepant  filtered  speech  was  significantly  below  that  of 
7-year-olds  (ts(54)=4.79  and  3.42,  respectively,  g<.01).  The  improvement  from 
age  4 to  age  7 in  the  detection  of  vocal  affect  in  discrepant  full  speech  parallels 
a similar  improvement  in  affect  detection  from  Fq  cues  in  filtered  speech.  There 
was  also  a small,  but  significant  improvement  in  the  detection  of  vocal  affect  in 
consistent  full  speech  from  age  4 to  age  7 (t(54)=4.31,  as  well  as  an 
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improvement  in  performance  on  both  consistent  and  discrepant  reiterant  speech 
(ts(54)=4.87  and  5.93,  respectively,  p<.01).  These  trends  are  presumed  to  reflect 
a general,  age-related  improvement  in  affect  detection. 

Finally,  the  Speech  Context  X Discrepancy  X Age  interaction  was  especially 
apparent  with  regard  to  differences  in  performance  between  7-  and  10-year-olds. 
As  predicted,  10-year-olds  were  significantly  more  sensitive  to  the  paralinguistic 
content  of  discrepant  full  speech  than  were  7-year-olds  (t(54)=2.42,  £<.01).  Also 
as  predicted,  this  corresponded  to  an  increase  in  correct  detections  of  vocal  affect 
for  consistent  (t(54)=2.83,  p<.01)  and  discrepant  filtered  speech  from  age  7 to 
age  10  (t(54)=2.44,  p<.01).  There  was  no  difference  between  7-  and  10-year- 
olds  in  the  detection  of  vocal  affect  for  consistent  full  speech  or  for  either 
consistent  or  discrepant  reiterant  speech,  reflecting  the  fact  that  ceiling 
performance  in  these  conditions  was  achieved  by  7 years  of  age. 

The  general  pattern  of  performance  across  age,  speech  context,  and 
discrepancy  held  up  relatively  well  for  average  performance  on  individual  stimuli 
as  well  as  for  average  performance  across  stimuli.  Percent  correct  for  each 
stimulus  as  a function  of  discrepancy  and  speech  context  is  given  for  each  age 
group  in  Tables  4-3,  4-4,  and  4-5.  These  tables  reveal  that,  for  each  age  group, 
children  were  more  accurate  in  their  ratings  of  consistent  full  speech  than  in  their 
ratings  of  consistent  reiterant  or  consistent  filtered  speech.  For  discrepant  stimuli. 
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performance  was  generally  best  in  the  reiterant  speech  context  and  roughly 
comparable  across  filtered  and  full  speech  contexts. 

As  indicated  previously,  the  between-subjects  effects  obtained  from  the 
reduced,  first  presentation,  data  set  were  descriptively  similar  to  the  within- 
subjects  effects  obtained  for  the  full  data  set.  However,  post  hoc  tests  (familywise 

a=.01)  were  conducted  to  elucidate  the  nature  of  the  order  effects.  There  were 
Age  X Order  and  Speech  Context  X Order  interactions  and  a marginally  significant 

Speech  Context  X Discrepancy  X Order  interaction.  Seven-  and  1 0-year-olds 
performed  better  than  4-year-olds  across  orders,  levels  of  discrepancy,  and 
speech  contexts.  However,  the  Age  X Order  interaction  indicates  that  the 
difference  in  overall  performance  between  4-year-olds  and  7-  and  1 0-year-olds 
only  reached  significance  for  one  order  of  stimulus  presentation  (t(7)  = 10.63  and 
t(7)  = 11.36,  respectively,  p<.01).  Two  aspects  of  this  interaction  are  especially 
important  to  consider.  First,  the  number  of  listeners  in  each  cell  is  quite  small 
(Ns=4-6)  resulting  in  greater  between-cell  variability  than  might  be  obtained  with 
a larger  sample.  Even  so,  the  pattern  of  effects  across  orders  of  presentation  is 
remarkably  consistent.  A second,  and  related  point,  is  that  the  interaction  reflects 
an  enhancement  of,  rather  than  a departure  from,  the  existing  trend  for  7-  and  1 0- 
year-olds  to  evidence  greater  paralinguistic  sensitivity  than  4-year-olds.  (See 
Table  4-6  for  means  and  standard  deviations.) 
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The  Speech  Context  X Order  interaction  was  subsumed  by  the  Speech  Context 
X Discrepancy  X Order  interaction,  and  post  hoc  comparisons  were  conducted 
on  the  higher  order  interaction.  These  comparisons  revealed  that,  across  ages, 
children’s  detection  of  vocal  affect  for  consistent  full  speech  was  significantly 
better  than  detection  of  vocal  affect  for  discrepant  full  speech  for  four  out  of  six 
orders  of  stimulus  presentation  (see  Table  4-7  for  means  and  t values).  (Again, 
the  effect  of  order  was  to  enhance  an  existing  trend  in  the  data.) 

In  summary,  reducing  the  data  set  to  only  those  responses  obtained  from  the 
first  stimulus  set  led  to  patterns  of  effects  quite  similar  to  those  obtained  for  the 
full  data  set.  In  general,  listeners  appeared  to  reach  ceiling  performance  by  about 
age  7 for  reiterant  speech  (both  consistent  and  discrepant)  and  for  consistent  full 
speech.  In  contrast,  performance  improved  from  4 to  10  years  of  age  on  both 
consistent  and  discrepant  filtered  speech  and  on  discrepant  full  speech.  This 
cannot  be  attributed  to  a gradual  acquisition  of  the  ability  to  detect  vocal  affect, 
since  ceiling  performance  is  achieved  earlier  for  reiterant  speech.  Nor  can  it  be 
attributed  solely  to  a confound  between  lexical  and  paralinguistic  cues  in  filtered 
speech  since  age-related  improvements  in  performance  were  obtained  across 
consistent  and  discrepant  utterances.  Instead,  this  finding  suggests  a gradual 
improvement  in  the  ability  to  utilize  specific  cues  (e.g.,  Fq)  which  have  both  a 
prosodic  and  a paralinguistic  function. 


Signal  Detection  Analysis 
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The  procedure  utilized  for  data  collection  (single  interval  forced-choice)  yields 
a dichotomous  response  variable.  These  data  are  particularly  well  suited  to  a 
signal  detection  analysis  in  which  a d'  statistic  is  calculated  as  a measure  of 
sensitivity  to  variations  in  paralinguistic  content.  An  advantage  of  signal  detection 
analysis  is  that  d is  adjusted  for  response  bias,  thus  yielding  a more  accurate 
measure  of  sensitivity  than  either  number  or  percent  correct.  The  measure  is 
calculated  by  subtracting  the  z-score  associated  with  the  proportion  of  observed 
false  alarms  from  the  z-score  associated  with  the  proportion  of  observed  hits.  In 
the  present  study,  a hit  wais  defined  as  the  conditional  probability  of  reporting 
"angry"  given  "angry"  paralanguage  and  a false  alarm  was  defined  as  the 
conditional  probability  of  reporting  "angry"  given  "happy"  paralanguage.  Thus, 
d'  = z(hits)  - z(false  alarms). 

In  developmental  research,  the  number  of  trials  is  limited  by  the  attention  span 
of  the  listener  and  a pooled  estimate  of  d'  is  often  calculated  across  listeners  and 
trials.  When  d'  is  calculated  on  group  data,  the  absolute  value  of  the  statistic  can 
be  reduced  by  two  types  of  individual  variation.  If  listeners  employ  widely 
different  decision  criterions  (a  difference  of  1 .5  units  or  greater)  or  vary  greatly  in 
sensitivity  (a  difference  of  2.0  units  or  greater),  the  d'  estimate  will  be  reduced 
(Macmillan  & Creelman,  1991;  Macmillan  & Kaplan,  1985).  Within  age  groups. 
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listeners’  criterions,  c'  = 0.5  [p(hits)  + p(false  alarms)],  differed  by  less  than  1 
unit.  However,  the  sensitivity  of  individual  listeners  varied  greatly. 

To  determine  if  the  reduction  in  absolute  d would  influence  planned 
comparisons,  both  individual  and  group  d 's  were  calculated  for  each  combination 
of  speech  context  and  discrepancy.  Individual  (averaged)  and  group  (pooled) 
estimates  of  sensitivity  were  plotted  and,  while  the  average  estimates  were 
somewhat  larger  than  the  pooled  estimates,  the  same  pattern  of  performance  was 
evident  across  levels  of  speech  context,  discrepancy,  and  age.  Thus,  in  addition 
to  the  fact  that  pooled  d'  is  a more  reliable  measure  than  average  d'  due  to  the 
larger  number  of  trials,  it  was  also  a good  relative  measure  of  sensitivity  in  the 
present  context.  Variance  was  calculated  and  statistical  significance  established 
using  the  series  of  equations  provided  for  group  data  by  Gourevitch  and  Galanter 
(1967).  Estimated  variance  was  obtained  with  the  equation, 

PlO~Pl)  ^ P20~P2) 
n^{ord  n^iord 

where 

Pi  = p(happy|happy), 

Ps  = P(happy|angry), 

/?i  = number  of  happy  trials, 

/?2  = number  of  angry  trials, 
and 

ord  2i  and  ord  z^  = ordinates  at  z corresponding 
to  the  observed  proportions. 
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The  test  statistic  is  G and  is  given  by  the  difference  in  d'  values  divided  by  the 
square  root  of  their  estimated  variances, 


The  statistic  is  distributed  as  z and  critical  values  can  be  obtained  from  a normal 
curve  table.  As  in  the  previous  analysis,  a was  set  at  .01. 

The  signal  detection  analysis  completely  corroborated  the  preliminary  analysis 

based  on  number  correct  (see  Figure  4-2).  As  predicted,  confidence  intervals 
established  around  4-year-olds’  responses  to  consistent  (d ' =0.25)  and  discrepant 
filtered  speech  (d'=0.15)  indicated  that  their  performance  was  at  chance.  Seven- 
year-olds  were  significantly  more  sensitive  than  4-year-oids  to  variations  in  the 
paralinguistic  content  of  both  consistent  (d'  = 1.52,  G =6.15,  e<.01)  and 
discrepant  (d'=0.98,  G =4.16,  p<.01)  filtered  speech.  This  effect  corresponded 
to  below  chance  sensitivity  to  variations  in  the  paralinguistic  content  of  discrepant 
full  speech  for  4-year-olds  (d '=-0.69)  and  a significant  increment  in  sensitivity  for 
7-year-olds  (d'=0.28,  G =4.96,  p<.01).  (Confidence  intervals  established  around 
the  d estimates  for  discrepant  full  speech  confirmed  that  4-year-olds’ 
performance  was  significantly  below  chance  and  that  7-year-olds’  performance 
did  not  differ  from  chance.)  There  was  also  significant  improvement  in  the 
detection  of  vocal  affect  in  consistent  full  speech  from  age  4 (d'=2.39)  to  age  7 
(d'=4.77,  G =4.97,  £<.01)  and  in  consistent  (d'  = 1.11  and  d'=2.53)  and 


di-4 


60 


discrepant  reiterant  speech  (d'  = 1.13  and  d'=3.06,  Gs=6.01  and  7.47, 
respectively,  p<.01). 

Ten-year-olds  were  significantly  more  sensitive  to  variations  in  the 
paralinguistic  content  of  discrepant  full  speech  (d'  = 1.27)  than  were  7-year-olds 
(d  =0.28,  G =4.87,  p<.01).  As  predicted,  this  corresponded  to  an  increase  in 
correct  detections  of  vocal  affect  for  consistent  filtered  speech  from  age  7 
(d  =1.51)  to  age  10  (d'=2.35,  G =3.52,  fi>.01)  and  for  discrepant  filtered 
speech  from  age  7 (d'=0.98)  to  age  10  (d'  = 1.68,  G =3.36,  p<.01).  As  in  the 
previous  analysis,  there  was  no  difference  between  7-  and  10-year-olds  in  the 
detection  of  vocal  affect  for  consistent  full  speech  or  for  either  consistent  or 
discrepant  reiterant  speech. 

Again,  listeners  reached  near-ceiling  performance  by  age  7 for  reiterant 
speech  (consistent  and  discrepant)  and  for  consistent  full  speech.  Performance 
improved  from  4 to  1 0 years  of  age  on  both  consistent  and  discrepant  filtered 
speech  and  on  discrepant  full  speech.  The  signal  detection  analysis  provides 
converging  evidence  for  the  pattern  of  effects  obtained  from  the  MANOVA 
approach.  Response  bias  did  not  alter  estimates  of  relative  performance  across 
levels  of  age,  speech  context,  and  discrepancy. 

Individual  Differences 

Individual  patterns  of  performance  across  age  groups  suggested  that  7-year- 
olds  were  more  varied  in  their  sensitivity  to  paralinguistic  cues  in  discrepant  full 
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and  filtered  speech  than  either  4-year-olds  or  10-year-olds.  It  is  possible,  then, 
that  7-year-olds  represent  a transitional  period  in  the  proposed  shift  from  prosodic 
to  paralinguistic  function.  To  assess  this  possibility,  median  tests  were  performed 
on  each  combination  of  speech  context  and  discrepancy  (Freund,  1973).  The 
procedure  consisted  of  finding  the  median  across  groups  for  each  condition  and 
counting  the  number  of  cases  above  and  below  the  median  within  groups.  The 
number  of  cases  falling  above  and  below  the  median  for  each  condition  are 
presented  in  Table  4-8  with  values.  For  each  combination  of  speech  context 
and  discrepancy,  was  significant,  indicating  that  the  observed  frequencies 
falling  above  and  below  the  median  across  ages  differed  from  chance. 

Of  particular  interest  was  whether  7-year-olds’  sensitivity  was  evenly 
distributed  about  the  median  relative  to  4-  and  10-year-olds.  Additional  / 
analyses  were  conducted  for  each  age  group.  The  results  indicated  that  the 
distribution  of  performance  about  the  median  differed  significantly  from  chance 
for  4-year-olds  Ct®(5)  N=28  =59.00,  E<.01),  7-year-olds  Ct^(5)  N=28  =32.00, 
E<.01),  and  10-year-olds  (;i^(5)  N=28  =42.57,  p<.01)  across  levels  of  speech 
context  and  discrepancy.  However,  because  it  appeared  that  good  performance 
on  consistent  full  speech  stimuli  may  have  accounted  for  much  of  this  effect  for 
the  7-year-old  sample,  y^%  were  also  calculated  for  each  group  omitting  the 
consistent  full  speech  condition.  These  results  indicated  that  the  distribution  of 
performance  about  the  median  differed  significantly  from  chance  for  4-year-olds 
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Cf^(4)  N=28  =58.43,  E<.01)  and  10-year-olds  (x^{4)  N=28  =42.57,  fi<.01)  but  not 
for  7-year-olds.  This  suggests  that  the  7-year-old  sample  is  actually  comprised 
of  one  group  whose  performance  is  most  consistent  with  that  of  the  4-year-old 
sample  and  one  group  whose  performance  is  most  consistent  with  that  of  the  1 0- 
year-old  sample.  However,  to  make  a strong  case  for  this  interpretation,  it  is 
necessary  to  show  that  the  same  individuals  perform  at  the  same  relative  level 
across  conditions. 

To  determine  the  extent  to  which  the  paralinguistic  sensitivity  of  individual  7- 
year-olds  to  discrepant  full  speech  could  be  predicted  on  the  basis  of  their 
sensitivity  to  filtered  speech,  children  were  cross-classified  as  above  or  below 
median  sensitivity  for  discrepant  full  and  discrepant  filtered  speech.  The  purpose 
was  to  compare  individual  listeners’  responses  across  different  instantiations 
(filtered  and  full  speech)  of  the  same  set  of  (discrepant)  stimuli.  Twenty  of  twenty- 
eight  7-year-o!ds  fell  in  the  same  relative  position  in  performance  on  both 
conditions.  That  is,  for  71%  of  individual  7-year-old  listeners,  relative  sensitivity 
was  roughly  equivalent  across  filtered  and  full  speech  versions  of  discrepant 
stimuli.  There  was  no  difference  in  age  (e.g.,  young  7-year-olds  versus  older  7- 
year-olds)  which  could  account  for  this  effect.  Individual  performance  across 
these  speech  contexts  was  also  similar  for  4-  and  1 0-year-olds  (see  Table  4-9). 
This  is  consistent  with  the  correspondence  between  utilization  of  as 
paralinguistic  or  prosodic  cue  and  sensitivity  to  vocal  affect  in  cases  of 
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discrepancy  predicted  by  the  linguistic  constraint.  Importantly,  this  finding 
suggests  that  the  7-year-old  sample  was  a transitional  group  comprised  about 
equally  of  two  subgroups  of  children:  those  who  evidenced  low  sensitivity  to 
affective  variations  in  both  filtered  and  full  discrepant  speech  and  those  who 
responded  with  high  sensitivity  across  filtered  and  full  speech  contexts. 

Summary  of  Findings 

The  MANOVA  and  signal  detection  analyses  yielded  identical  pictures  of  age 
differences  in  sensitivity  to  affective  variations  across  speech  contexts  and  levels 
of  discrepancy.  As  predicted,  4-year-olds  were  more  accurate  in  their  attributions 
of  affect  when  listening  to  reiterant  speech  relative  to  filtered  speech.  This  effect 
was  also  obtained  for  7-year-olds.  However,  for  1 0-year-olds,  performance  was 
better  for  reiterant  than  for  filtered  speech  only  when  the  stimuli  were  discrepant, 
suggesting  that  10-year-olds  may  have  based  some  of  their  judgments  of  filtered 
speech  on  lexical  as  well  as  paralinguistic  cues.  Children  across  age  groups 
performed  better  on  both  consistent  and  discrepant  reiterant  speech  than  on 
discrepant  full  speech.  By  7 years  of  age,  listeners’  sensitivity  to  affective 
variations  in  reiterant  speech  and  in  consistent,  full  speech  approached  ceiling 
(90-99%  correct).  Also  as  predicted,  4-year-olds  performed  poorly  on  both  filtered 
speech  and  on  discrepant  full  speech  and  performance  in  these  conditions 
improved  across  the  range  of  ages  in  the  present  study. 


Analysis  of  within-subjects  sensitivity  reveals  that  the  transition  from  the 
prosodic  to  the  paralinguistic  use  of  cues  such  as  may  occur  at  about  7 years 
of  age.  As  a group,  7-year-olds’  performance  was  more  varied  than  that  of  4-  or 
1 0-year-olds  in  filtered  and  discrepant  full  speech.  However,  the  performance  of 
individual  listeners  was  remarkably  consistent  across  filtered  and  full  speech 
versions  of  the  same  set  of  discrepant  stimuli  suggesting  that  the  linguistic 
constraint  may  be  explanatory  at  the  level  of  individual  sensitivity.  If  the  prosodic 
function  of  Fj,  contributes  to  children’s  difficulty  in  reporting  paralinguistic  cues  in 
discrepant  messages,  then  just  such  a correspondence  between  sensitivity  to 
filtered  speech  (in  which  judgments  are  based  upon  variations  in  F^)  and 
discrepant  full  speech  would  be  expected. 
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Table  4-1 

Summary  Table  for  Speech  Context  X Discrepancy  X Age  X Order 
Repeated  Measures  MANOVA 


Source  ^ 

Speech  Context  (SC)  2, 

Age  2, 

Order  5, 

Age  X Order  1 0, 

Speech  Context  X Age  4, 

Speech  Context  X Order  1 0, 

SC  X Age  X Order  20, 

Discrepancy  (D)  1 , 

Discrepancy  X Age  2, 

Discrepancy  X Order  5, 

D X Age  X Order  1 0, 

SC  X D 2, 

SC  X D X Age  4, 

SC  X D X Order  1 0, 

SC  X D X Age  X Order  20, 


F 


65 

56.58** 

66 

80.83** 

66 

1.20 

66 

3.12** 

132 

3.60** 

132 

5.35** 

132 

1.41 

66 

94.86** 

66 

4.56* 

66 

2.23 

66 

1.10 

65 

107.46** 

132 

4.67** 

132 

2.27* 

132 

1.43 

Note.  **p<.01  *p<.02. 
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Table  4-2 

Mean  Number  Correct  (and  Standard  Deviations)  as  a Function  of  Aae. 
Speech  Context,  and  Discrepancy 


Age 

Speech  Context  4-year-olds  7-year-olds  10-year-olds 


Filtered 


Consistent 

6.60 

(2.48) 

9.29 

(1.60) 

10.50 

(1.60) 

Discrepant 

6.35  (1.85) 

8.25 

(2.27) 

9.61 

(1.87) 

Reiterant 

Consistent 

8.53 

(2.19) 

10.75 

(1.00) 

10.50 

(1.26) 

Discrepant 

8.57 

(2.16) 

11.25 

(1.00) 

11.50 

(0.69) 

Full 

Consistent 

10.53 

(1.69) 

11.92 

(0.26) 

11.75 

(0.58) 

Discrepant 

4.43 

(2.47) 

6.75 

(3.58) 

8.79 

(2.62) 

Note.  The  maximum  possible  number  correct  is  12. 


Table  4-3 

Percent  Correct  Judgments  of  Paralinauistic  Content  by  4-vear-olds 
As  a Function  of  Stimulus  and  Speech  Context 
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Table  4-4 

Percent  Correct  Judgments  of  Paralinauistic  Content  bv  7-vear-olds 
As  a Function  of  Stimulus  and  Speech  Context 
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Table  4-5 

Percent  Correct  Judgments  of  Paralinauistic  Content  bv  1 0-vear-olds 
As  a Function  of  Stimulus  and  Speech  Context 
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Table  4-6 

Mean  Number  Correct  (and  Standard  Deviations)  As  a Function  of  Aae 

and  Order  of  Presentation 


Order 

Mean  Number  Correct 

4-vear-olds 

7-vear-olds 

1 0-vear-olds 

1 (filtered,  reiterant,  full) 

7.33  (0.36) 

10.00  (0.98) 

9.23  (0.58) 

2 (filtered,  full,  reiterant) 

7.80  (0.67) 

9.00  (0.83) 

10.12  (0.32) 

3 (reiterant,  full,  filtered) 

7.08  (1.57) 

9.33  (0.59) 

10.80  (0.73) 

4 (reiterant,  filtered,  full) 

6.42  (0.78) 

10.73  (0.42) 

11.00  (0.41)* 

5 (full,  filtered,  reiterant) 

7.75  (1 .45) 

9.50  (1.26) 

10.93  (0.69) 

6 (full,  reiterant,  filtered) 

8.43  (1.36) 

9.54  (0.81) 

9.23  (0.58) 

Note.  * indicates  7-  and  10-year-olds’  performance  greater  than  4-year-olds’ 
at  familywise  a = .01 . 
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Table  4-7 

Mean  Differences  in  Ratings  (Consistent  minus  Discrepant  Full  Soeechl 
As  a Function  of  Order  of  Presentation 

Order  Mean  Difference  t 

1 (filtered,  reiterant,  full)  4.75  5.24* 

2 (filtered,  full,  reiterant)  5.57  5.95* 

3 (reiterant,  full,  filtered)  4.77  3.60 

4 (reiterant,  filtered,  full)  2.71  3.33 

5 (full,  filtered,  reiterant)  3.92  6.90* 

6 (full,  reiterant,  filtered)  6.93  7.23* 


Note.  * indicates  familywise  g<.01. 
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Table  4-8 

Individual  Cases  Falling  Above  and  Below  Median  Sensitivity  and  Chi-Sauares 
As  a Function  of  Age,  Speech  Context,  and  Discrepancy 


Filtered  Speech  Reiterant  Speech  Full  Speech 

Age  consistent  discrepant  consistent  discrepant  consistent  discrepant 


4-year-olds 


below 

22.0 

24.0 

22.0 

23.0 

16.0 

24.0 

above 

6.0 

4.0 

6.0 

5.0 

12.0 

4.0 

7-year-olds 

below 

11.0 

12.0 

7.0 

9.5 

2.0 

12.0 

above 

17.0 

16.0 

21.0 

18.5 

26.0 

16.0 

1 0-year-olds 

below 

4.0 

6.0 

8.0 

7.0 

5.0 

7.0 

above 

24.0 

22.0 

20.0 

21.0 

23.0 

21.0 

X"(2,  N=84)  = 

24.71 

24.00 

21.28 

21.46 

32.71 

19.14 

Note,  all  values  are  significant  at  p<.01 . 


Cross-classification  of  Individual  Cases  Falling  Above  and  Below  Median 
Sensitivity  in  Discrepant  Filtered  and  Full  Speech  Contexts 
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Mean  Correct 
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A consistent 


Figure  4-1 


Comparison  of  Effects  for  the  Full  Data  Set  (a) 
and  First  Presentation  Only  (b) 
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Figure  4-2 

Paralinguistic  Sensitivity  as  a Function  of  Age, 
Speech  Context,  and  Discrepancy 


CHAPTER  5 
DISCUSSION 

The  observed  developmental  changes  in  sensitivity  to  affective  variations  in 
speech  from  4 to  10  years  of  age  are  consistent  with  a model  of  vocal  affect 
perception  based  on  cognitive  and  linguistic  constraints.  As  the  cognitive 
constraint  on  dual  representations  would  predict,  in  cases  of  discrepancy,  4-year- 
olds  made  affective  attributions  based  on  the  valence  of  a single  stimulus 
dimension  and,  in  accordance  with  the  linguistic  constraint,  this  dimension  was 
lexical  content.  As  a result,  4-year-olds’  poorest  performance  was  obtained  for 
discrepant  full  speech  stimuli.  Converging  evidence  for  the  existence  of  the 
linguistic  constraint  was  provided  by  4-year-olds’  chance  sensitivity  to  affective 
variations  in  filtered  speech  which  required  them  to  make  judgments  based  on  Fq 
alone.  Alternative  accounts  of  children’s  sensitivity  in  cases  of  discrepancy  which 
rely  solely  upon  the  gradual  acquisition  of  paralinguistic  competence  are  rendered 
untenable  by  the  fact  that  4-year-olds  performed  relatively  well  (71  % correct)  in 
the  reiterant  speech  context.  The  pattern  of  performance  is  clear;  4-year-olds  are 
sensitive  to  affective  variations  in  speech  when  such  variations  are  carried  by  cues 
other  than,  or  in  addition  to,  fundamental  frequency.  They  are  insensitive  to 
affective  variations  carried  by  alone,  and  when  faced  with  affective  discrepancy 
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between  lexical  and  paralinguistic  cues,  4-year-olds  are  unable  to  selectively 
attend  to  information  carried  paralinguistically. 

Seven-year-olds’  sensitivity  to  paralinguistic  cues  in  discrepant  full  speech  was 
significantly  greater  than  that  of  4-year-olds  although,  interestingly,  7-year-olds’ 
sensitivity  was  not  above  chance  in  this  condition.  The  analysis  of  individual 
patterns  of  sensitivity  revealed  that  this  was  due  to  wide  variation  in  the 
performance  of  individual  children  across  the  range  of  possible  scores.  About 
one-half  of  the  listeners  in  this  age  range  performed  similarly  to  4-year-olds  (i.e., 
basing  their  attributions  primarily  on  lexical  content)  while  the  other  half  performed 
more  similarly  to  1 0-year-olds  (basing  their  attributions  primarily  on  paralinguistic 
cues.) 

Even  though  7-year-olds  are  not  limited  by  constraints  on  dual  representation, 
they  may  be  biased  toward  attributions  based  on  lexical  cues  due  to  continued 
operation  of  the  linguistic  constraint.  In  fact,  7-year-olds  appear  to  represent  a 
transitional  group  with  respect  to  utilization  of  Fq  as  a paralinguistic  cue  and 
sensitivity  to  paralanguage  in  the  presence  of  competing  lexical  content.  This 
interpretation  is  corroborated  by  the  fact  that  the  same  7-year-olds  who  showed 
low  sensitivity  to  paralinguistic  cues  in  discrepant  full  speech  also  showed  low 
sensitivity  to  Fq  variations  in  filtered  speech.  Seven-year-olds  who  performed  well 
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on  discrepant  full  speech  also  performed  well  on  filtered  speech,  suggesting  that 
the  ability  to  use  paralinguistically  accounts  in  part  for  this  effect. 

Again,  as  was  the  case  with  4-year-olds,  this  upward  shift  in  sensitivity  to  vocal 
paralanguage  in  filtered  and  full  speech  contexts  cannot  be  attributed  solely  to 
general  developmental  changes  in  the  accuracy  of  affect  detection.  In  the 
reiterant  speech  context,  7-year-olds’  group  performance  was  near  ceiling  (87%- 
94%  correct)  and  71%  of  7-year-olds  performed  above  median  sensitivity.  The 
ability  of  7-year-olds  to  selectively  attend  to  paralinguistic  cues  in  the  presence 
of  competing  lexical  information  appears  yoked  to  their  ability  to  utilize  alone 
and  sensitivity  in  each  of  these  contexts  appears  to  develop  much  more  slowly 
than  sensitivity  to  paralanguage  consisting  of  both  prosodic  (Fq,  duration, 
intensity)  and  nonprosodic  (voice  quality)  acoustic  constituents. 

Continued  increments  in  sensitivity  to  affective  variations  carried  by  F^  in 
filtered  speech  and  in  the  presence  of  competing  lexical  information  were 
observed  for  the  1 0-year-old  sample.  As  for  younger  children,  sensitivity  across 
these  contexts  appeared  to  be  related  for  both  group  and  individual  performance. 
This  is  consistent  with  predictions  implied  by  the  linguistic  constraint  although  the 
constraint  appears  to  be  operating  over  a much  longer  period  of  development 
than  was  anticipated  at  this  study’s  inception.  This  is  perhaps  not  unreasonable 
given  the  evidence  that  prosodic  variations  play  an  important  role  in  the  facilitation 
of  speech  comprehension  even  among  adults  (Cutler  & Foss,  1977;  Cutler  & 
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Norris,  1988).  In  addition,  Aitchison  and  Chiat  (1981)  have  reported  an  advantage 
for  stressed  syllables  among  4-  to  9-year-olds  engaged  in  the  learning  of  novel 
words.  For  these  reasons,  bias  toward  the  prosodic  processing  of  specific 
acoustic  variables  may  influence  paralinguistic  sensitivity  in  specific  contexts  (e.g., 
filtered  and  discrepant  speech)  over  an  extended  developmental  period. 

There  was  no  corresponding  improvement  in  sensitivity  to  reiterant  speech 
from  7 to  1 0 years  of  age.  As  mentioned  previously,  near-ceiling  performance  in 
the  reiterant  speech  context  was  achieved  by  age  7 suggesting  that  the  ability  to 
decode  affective  information  from  acoustic  cues  is  in  place  before  age  1 0 and 
cannot  explain  observed  changes  in  sensitivity  in  the  filtered  and  full  speech 
contexts. 

The  constraint  on  dual  representations  and  the  constraint  on  the  processing 
priority  of  prosody  versus  paralanguage  can  account  in  large  part  for 
developmental  changes  in  sensitivity  to  vocal  paralanguage  in  those  cases  in 
which  it  conflicts  with  lexical  cues.  Clearly,  however,  there  are  other  types  of 
explanation  that  might  function  alone  or  in  conjunction  with  these  constraints  to 
produce  a similar  pattern  of  results.  In  the  next  two  sections,  the  focus  of  the 
discussion  will  be  on  the  explication  of  the  proposed  cognitive  and  linguistic 
constraints.  A final  section  will  address  the  limitations  of  the  current  approach 
and  the  implications  for  future  research. 
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The  Nature  of  the  Cognitive  Constraint 
As  developmental  theorists  have  recently  argued,  the  positing  of  constraints 
as  explanatory  constructs  for  developmental  change  must  be  accompanied  by  a 
clarification  of  the  origin  of  the  constraint  and  the  level  at  which  the  constraint  is 
operating  (Campbell  & Bickhard,  1992;  Keil,  1990).  Importantly,  Campbell  and 
Bickhard  distinguish  between  constraints  on  processing  resources  and  constraints 
on  reflecting  on  representations  (or  "knowing-levels"  pp.313-317).  In  an  analysis 
of  children’s  appearance-reality  errors  across  a wide  range  of  tasks,  Flavell, 
Flavell,  and  Green  (1983)  concluded  that  children’s  difficulty  had  to  do  with 
metarepresentational  limitations  (i.e,  limitations  on  the  ability  to  reflect  on  one’s 
mental  representations).  This  is  consistent  with  the  knowing-levels  constraint 
proposed  by  Campbell  and  Bickhard.  Accordingly,  the  constraint  on  dual 
representations,  in  the  present  context,  is  conceived  of  as  a domain-general 
constraint  on  metarepresentation  or  knowing-levels.  It  is  expected  that  children 
4 years  of  age  and  younger  do,  in  fact,  process  simultaneously  discrepant 
stimulus  dimensions  but  are  unable  to  reflect  on  these  representations.  Of 
course,  this  constraint  is  most  likely  necessitated  by  limitations  on  processing 
capacity  although  it  is  not  a capacity  constraint,  per  se. 

This  constraint  is  expected  to  modulate  information  processing  from  the  onset 
of  mental  representations  (approximately  5 months  of  age  when  the  first  evidence 
of  object  permanence  is  present;  Baillargeon,  Spelke,  & Wasserman,  1985)  to 
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approximately  4 years  of  age  when  children  show  considerable  difficulty  reporting 
simultaneous  representations  on  some  tasks  (Flavell,  1986;  Friend  & Davis,  1993; 
Gopnik  & Astington,  1988;  Harris,  Donnelly,  Guz,  & Pitt-Watson,  1986).  However, 
it  is  expected  to  have  no  effect  on  sensitivity  to  vocal  paralanguage  prior  to  the 
acquisition  of  a receptive  vocabulary  during  the  latter  part  of  the  infant’s  first  year 
(and  then,  only  in  cases  of  discrepancy  between  paralinguistic  and  lexical  cues.) 

In  the  context  of  the  present  effort,  the  constraint  on  dual  representations  was 
expected  to  influence  the  paralinguistic  sensitivity  of  4-year-olds  to  affective 
variations  in  discrepant  full  speech  by  causing  them  to  focus  on  only  a single 
stimulus  dimension.  This  constraint  does  not  predict  which  dimension  will  receive 
representational  priority.  That  is,  4-year-olds  would  be  expected  to  focus  on 
either  paralanguage  or  lexical  cues,  but  not  both.  If  performance  were  governed 
by  this  constraint  only,  and  if  both  paralinguistic  and  lexical  cues  are  equally 
salient,  then  children  would  be  expected  to  base  approximately  half  of  their 
affective  judgments  on  paralinguistic  cues.  An  additional  constraint  is  required 
to  account  for  the  tendency  of  4-year-olds,  and  some  7-year-olds,  to  report 
speaker  affect  consistent  with  lexical,  but  not  paralinguistic,  cues. 

The  Nature  of  the  Linguistic  Constraint 

The  linguistic  constraint  makes  a specific  prediction  about  which  set  of  cues 
(lexical  or  paralinguistic)  children  will  use  as  a basis  for  a judgment  of  speaker 
affect.  Like  the  constraint  on  dual  representations,  the  linguistic  constraint  derives 
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from  developmental  capacity  limitations.  However,  in  contrast  to  the  constraint 
on  dual  representations,  the  linguistic  constraint  is  conceived  of  as  a domain- 
specific  constraint  on  the  nature  of  systems  being  constructed.  Specifically,  the 
linguistic  constraint  is  a function  of  the  language  acquisition  process,  acting  to 
selectively  focus  attention  on  specific  auditory  signals  that  are  relevant  to  the 
efficient  processing  of  the  child’s  native  language.  Importantly,  the  linguistic 
constraint  applies  generally  to  all  cases  in  which  a discrepancy  exists  between 
lexical  and  paralinguistic  cues,  not  just  those  cases  relevant  to  affect  detection. 

Because  this  constraint  is  viewed  as  developing  out  of  the  language 
acquisition  process,  and  because  available  empirical  data  suggest  that 
prelinguistic  infants  are  sensitive  to  paralinguistic  variations,  the  constraint  is 
expected  to  influence  the  perception  of  discrepant  auditory  messages  at  about 
the  time  that  infants  acquire  the  first  few  items  of  their  receptive  vocabularies. 
Even  before  the  acquisition  of  specific  lexical  items,  the  linguistic  constraint 
should  direct  infant  attention  to  important  lexical  units  and  lexical  contrasts,  thus 
facilitating  acquisition.  This  view  is  consistent  with  the  model  proposed  by 
Femald  (1991,  1992)  in  which  the  initial  function  of  acoustic  modifications  in 
speech  is  the  unconditioned  elicitation  of  specific  infant  states.  By  the  end  of  the 
first  year,  however,  acoustic  modifications  serve  to  highlight  or  emphasize 
semantic,  syntactic,  and  grammatical  units. 
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There  are  three  important  distinctions  between  this  view  and  the  model 
proposed  by  Femald  (1991,  1992).  First,  in  Fernald’s  model,  the  level  at  which 
auditory  processing  biases  operate  to  constrain  development  has  not  been 
articulated.  Second,  in  contrast  to  the  current  approach,  Fernald  appears  to  view 
acoustic  modifications  of  speech  as  primarily  prosodic  in  character  as  opposed 
to  conveying  both  prosodic  and  paralinguistic  information.  Finally,  in  its  current 
conceptualization,  the  linguistic  constraint  continues  to  bias  auditory  processing 
toward  language-relevant  information  beyond  the  initial  acquisition  of  a receptive 
vocabulary  and  into  early  childhood. 

Recent  approaches  to  language  acquisition  and  developmental  speech 
perception  which  emphasize  the  role  of  general  auditory  mechanisms  are 
consistent  with  this  account.  In  particular,  Jusczyk  and  Bertoncini’s  (1988)  notion 
of  speech  perception  as  an  innately  guided  learning  process  offers  a 
psychobiological  account  of  early  sensitivity  to  speech  contrasts.  According  to 
this  view,  infants  are  predisposed  to  attend  to,  process,  and  encode  information 
that  is  relevant  to  the  utterances  of  their  native  language.  This  work  could  be 
accomplished  by  general  auditory  mechanisms  which  are  optimally  sensitive  to 
specific  sorts  of  acoustic  input.  A developmental  predilection  for  prosodic 
contrasts  could  easily  emanate  from  the  nature  of  the  prenatal  auditory 
environment  (Cooper  & Aslin,  1989;  DeCasper  & Spence,  1991).  In  addition, 
prosodic  features  are  good  candidates  for  special  processing  status  because  they 
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facilitate  speech  processing  and  organize  the  individual  units  of  the  speech 
stream  into  a perceptual  whole  (Cole  & Scott,  1974). 

In  the  current  model,  the  meaningfulness  of  vocal  signals  to  the  young  infant 
is  given  by  their  paralinguistic,  rather  than  their  lexical  or  prosodic,  acoustic 
content.  With  development  and  exposure,  increasing  differentiation  between 
speech  and  nonspeech  sounds  results  in  the  construction  of  a system  for  the 
efficient  processing  and  analysis  of  native  speech.  Certain  acoustic  variables 
such  as  Fq,  duration,  and  intensity  perceptually  highlight  the  new  units  of  meaning 
(the  lexicon)  rather  than  conveying  meaning  themselves.  (In  fact,  in  the  absence 
of  recognizable  speech,  appeared  to  convey  no  information  to  the  youngest 
listeners  in  the  current  study.)  With  increases  in  processing  efficiency  and 
capacity,  the  bias  toward  the  prosodic  processing  of  acoustic  variables  should 
decrease.  Because  the  ability  to  process  paralinguistic  information  is  maintained 
throughout  development  (witness  children’s  high  sensitivity  to  reiterant  speech 
across  age  groups),  the  increased  availability  of  processing  resources  should 
facilitate  affective  sensitivity  in  two  ways:  1)  children  should  begin  to  detect 
paralinguistic  variations  in  acoustic  variables  which  also  provide  prosodic 
information,  and  2)  proficiency  in  the  coordination  of  conflicting  cues  (lexical  and 
paralinguistic)  should  increase.  The  ability  to  detect  paralinguistic  variations  in 
acoustic  features  such  as  F^  (as  demonstrated  by  some  7-year-olds  and  most  1 0- 
year-olds  in  the  filtered  speech  context)  may  be  causally  related  to  the 
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coordination  of  conflicting  cues  or  may  simply  be  indicative  of  a general 
reallocation  of  processing  resources. 

Evidence  from  the  psychoacoustic  and  speech  perception  literature  suggest 
that,  at  the  level  of  extracting  specific  acoustic  information  from  the  speech  signal, 
processing  limitations  could  play  a major  role.  In  principle,  Fq,  duration,  and 
intensity  could  be  extracted  from  the  same  global  acoustic  source,  the  speech 
waveform  envelope  (Cole  & Scott,  1974;  Ucklider,  1959).  Recent  empirical 
evidence  suggests  that  adults  extract  intelligible  speech  from  the  waveform 
envelope  (Robert  Shannon,  personal  communication,  April  4,  1994)  and  that 
analysis  of  the  waveform  envelope  may  be  a fundamental  auditory  process  (Berg, 
1994).  The  waveform  envelope  is  a product  of  variations  in  amplitude  over  time 
and  is  a source  of  acoustic  information  available  both  pre-  and  postnatally. 
Young  children  have  extensive  exposure  to  this  acoustic  source  and  it  is  apparent 
that  capacity  considerations  might  limit  the  extent  to  which  the  waveform  is 
exploited  to  provide  both  lexical  and  paralinguistic  cues.  Whether  children,  in 
fact,  extract  cues  from  the  waveform  envelope  is  a matter  of  speculation  albeit 
one  with  interesting  implications  for  the  current  model. 

Limitations  and  Directions  for  Future  Research 

While  the  available  empirical  data  provide  suggestive  evidence  for  the 
proposed  constraints,  considerable  converging  evidence  and  refinement  and 
explication  of  the  current  model  are  required.  Two  limitations  specific  to  the 
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current  study  result  from  the  choice  of  speech  contexts  and  subject  populations. 
First,  the  filtered  speech  context  was  designed  to  provide  primarily  Fq  information. 
However,  it  is  an  imperfect  way  to  achieve  this  and  some  lexical  content  is 
inevitably  preserved  due  to  the  redundancy  of  natural  speech.  It  will  be  useful  in 
future  research  to  utilize  speech  synthesis  techniques  to  isolate  the  Fq  contour 
and  to  determine  if  the  results  obtained  with  filtered  speech  are  replicable. 
Second,  the  current  study  employed  only  female  listeners  to  establish  the 
presence  of  a genuine  developmental  effect  before  examining  potential  gender 
differences.  The  possibility  of  gender  differences  in  the  developmental 
progression  of  paralinguistic  sensitivity  remains  to  be  explored. 

With  regard  to  both  the  constraint  on  dual  representations  and  the  linguistic 
constraint,  there  are  many  unanswered  questions.  To  begin,  both  constraints  are 
viewed  as  deriving  from  developmental  limitations  on  processing  capacity,  yet 
careful  task  analysis  is  required  before  the  capacity  requirements  of  paralinguistic 
processing  can  be  addressed.  This  is  an  extremely  complex  problem  given  the 
wide  range  of  cues  that  children  might  utilize  in  the  processing  of  natural  speech 
and  the  corresponding  range  of  potential  paralinguistic  cues.  A related  issue  is 
that  the  nature  of  prosodic  versus  paralinguistic  representation  in  terms  of  rapidity 
and  strength  of  encoding  remains  unspecified. 

Both  constraints  are  expected  to  have  clear  relationships  with  other 
developmental  phenomenon.  The  constraint  on  dual  representations,  as 
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evidenced  in  children’s  interpretations  of  discrepancy,  should  be  corroborated  by 
performance  on  other  tasks  such  as  false  belief,  appearance-reality,  and 
representational  change.  If  this  constraint  contributes  to  children’s  difficulty  with 
affective  discrepancy,  then  children  who  are  sensitive  to  paralinguistic  variations 
in  discrepant  messages  should  also  perform  well  on  other  tasks  requiring  dual 
representation.  Similarly,  the  onset  of  the  linguistic  constraint  is  expected  to  be 
related  to  the  acquisition  of  a receptive  vocabulary.  If  this  is  correct,  then  there 
should  be  a period  of  development  beginning  sometime  in  the  first  year  during 
which  sensitivity  to  paralanguage  in  cases  of  discrepancy  is  negatively  correlated 
with  receptive  vocabulary.  Empirical  evidence  is  required  on  each  of  these  issues 
to  assess  the  adequacy  of  this  account. 

Finally,  a working  model  based  on  general  auditory  mechanisms  impinged 
upon  by  domain-general  constraints  on  representation  and  domain-specific 
constraints  pertaining  to  language  has  been  proposed  to  account  for 
developmental  changes  in  paralinguistic  sensitivity.  An  impetus  for  developing 
such  a general  model  was  the  finding  that,  in  emotion  as  well  as  in  other 
domains,  discrepancy  between  lexical  and  paralinguistic  cues  results  in  some 
form  of  interference  for  young  children.  However,  social-cognitive  factors  specific 
to  the  domain  of  emotions  such  as  hypothesis  testing  about  the  reliability  of 
lexical  versus  paralinguistic  cues  are  probably  involved  in  children’s 
interpretations  of  affective  discrepancy.  For  example,  Harter  and  Buddin  (1987) 
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demonstrated  that  children  between  7 and  1 1 years  of  age  find  it  difficult  to 
generate  scenarios  in  which  a person  would  direct  conflicting  emotions  toward 
a single  target. 

That  children  engaged  in  social-cognitive  reasoning  was  evident  in 
spontaneous  comments  obtained  in  the  present  study.  In  response  to  one 
stimulus,  a 1 0-year-old  said,  "She  says,  ’Don’t  play  around  with  me  child,’  but  she 
sounds  happy.  She  doesn’t  want  to  be  mean,  but  she’s  a little  mad."  Another  1 0- 
year-old  asked,  "Are  these  mixed  emotions?"  Responses  which  indicated  an 
awareness  of  lexical/paralinguistic  discrepancy  and  an  attempt  to  resolve  it,  were 
offered  by  several  7-  and  1 0-year-olds,  but  no  4-year-olds,  in  the  present  study. 
Clearly,  at  least  some  of  the  7-  and  1 0-year-olds  are  reasoning  about  the  meaning 
and  interpretation  of  discrepancy.  The  extent  to  which  this  reasoning  occurred 
may  have  been  underestimated  by  the  design  of  the  task. 

A complete  account  of  children’s  interpretations  of  affective  discrepancy  will 
necessarily  include  constraints  on  many  levels.  The  proposed  working  model  is 
based  on  a limited  number  of  constraints  that  are  generalizable  to  lexical  and 
paralinguistic  discrepancies  across  many  domains.  While  the  model  is  admittedly 
speculative,  some  clearly  falsifiable  predictions  have  been  offered.  This  model 
and  the  empirical  evidence  that  has  been  generated  in  its  support  have  important 
implications  for  affective  development. 
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Specifically,  it  is  suggested  that  affect  and  language  share  resources  (an  idea 
also  espoused  by  Bloom,  1991).  To  the  extent  that  this  is  true,  the  acquisition  of 
language  may  influence  the  types  of  cues  available  for  affect  detection  and  the 
way  that  attention  is  allocated  to  them.  This  is  particularly  important  in  that  it 
suggests  that  the  relevance  of  specific  cues  shifts  depending  upon  current 
developmental  priorities.  Vocal  paralanguage  may  be  an  important  percept  for 
the  prelinguistic  infant  and  for  older  children  and  adults.  However,  the  allocation 
of  limited  resources  to  linguistic  processing  early  in  language  acquisition  may  limit 
the  accessibility  of  vocal  paralanguage  in  cases  of  lexical/paralinguistic 
discrepancy.  Whether  other  sorts  of  paralanguage  (e.g.,  facial  expression)  are 
similarly  affected,  unaffected,  or  even  enhanced,  by  the  onset  of  language 
acquisition  is  a question  yet  to  be  addressed.  These  possibilities  have  profound 
implications  for  developmental  changes  in  the  understanding  of  affective 
discrepancy  as  they  suggest  that  discrepant  messages  are  processed  in  a 
fundamentally  different  way  by  young  children  relative  to  adults.  Further,  the 
mechanisms  that  are  posited  to  account  for  these  developmental  changes  have 
their  origin  outside  the  affective  domain.  Perhaps  the  most  important  contribution 
made  by  this  work  is  its  emphasis  on  the  theoretical  richness  that  results  from  an 
integrative  approach  to  developmental  issues  generally,  and  to  affective 
development  in  particular. 
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SUBSTITUTION  CODE  FOR  REITERANT  SPEECH 


For  syllables  beginning  with 
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b 
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hard  c 
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ma 
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